[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering and categorical attributes (fwd)From: Warren Sarle Date: Mon, 12 Jan 1998 19:02:24 -0500 (EST) I wrote: > There is nothing inherently wrong with doing k-means on dummy 0|1 > variables generated from categorical attributes. Murray Jorgensen <maj@waikato.ac.nz> replied: > True, but when dealing with attributes with 1000 levels you would >need to > generate 999 dummy variables for each of these. I see things >starting to > get a little messy! What's really messy is when people want to use zip codes with 20,000 levels, many of which have only one or two cases. In such situations, I ask "Do you really want to do that?" and they say "Yes!" <sigh> But assuming there are adequate data to support the analysis, generating a few thousand dummy variables isn't all that bad. It's certainly less hassle than computing a distance matrix when you have a million cases! -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. * Do not send me unsolicited commercial, political, or religious email *
|
MHonArc
2.2.0