Re: DM: Clustering and categorical attributes (fwd)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering and categorical attributes (fwd)

From: Warren Sarle
Date: Mon, 12 Jan 1998 19:02:24 -0500 (EST)

I wrote:
> There is nothing inherently wrong with doing k-means on dummy 0|1
> variables generated from categorical attributes.

Murray Jorgensen <maj@waikato.ac.nz> replied:
> True, but when dealing with attributes with 1000 levels you would 
>need to
> generate 999 dummy variables for each of these. I see things 
>starting to
> get a little messy!

What's really messy is when people want to use zip codes with 20,000
levels, many of which have only one or two cases. In such situations,
I ask "Do you really want to do that?" and they say "Yes!" <sigh>

But assuming there are adequate data to support the analysis,
generating a few thousand dummy variables isn't all that bad.
It's certainly less hassle than computing a distance matrix when
you have a million cases!

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not 
necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
* Do not send me unsolicited commercial, political, or religious 
email *

Follow-Ups:
- Re: DM: Clustering and categorical attributes (fwd)
  - From: Murray Jorgensen

References:
- Re: DM: Clustering and categorical attributes (fwd)
  - From: Murray Jorgensen

Prev by Date: Re: DM: Clustering and categorical attributes (fwd)
Next by Date: Re: DM: Clustering and categorical attributes (fwd)
Prev by thread: Re: DM: Clustering and categorical attributes (fwd)
Next by thread: Re: DM: Clustering and categorical attributes (fwd)
Index(es):
- Date
- Thread