[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering and categorical attributes (fwd)From: Murray Jorgensen Date: Mon, 12 Jan 1998 20:07:19 -0500 (EST) Surely someone, somewhere has mapped US zip codes onto a two or three dimensional grid. (Maybe even someone in SAS Institute!) Then you could replace the zips with two or three numeric grid coordinates. Murray At 19:09 12/01/98 -0500, Warren Sarle wrote: >I wrote: >> There is nothing inherently wrong with doing k-means on dummy 0|1 >> variables generated from categorical attributes. > >Murray Jorgensen <maj@waikato.ac.nz> replied: >> True, but when dealing with attributes with 1000 levels you would >need to >> generate 999 dummy variables for each of these. I see things >starting to >> get a little messy! > >What's really messy is when people want to use zip codes with 20,000 >levels, many of which have only one or two cases. In such situations, >I ask "Do you really want to do that?" and they say "Yes!" <sigh> > >But assuming there are adequate data to support the analysis, >generating a few thousand dummy variables isn't all that bad. >It's certainly less hassle than computing a distance matrix when >you have a million cases! > >-- > >Warren S. Sarle SAS Institute Inc. The opinions expressed >here >saswss@unx.sas.com SAS Campus Drive are mine and not >necessarily >(919) 677-8000 Cary, NC 27513, USA those of SAS Institute. >* Do not send me unsolicited commercial, political, or religious >email * > >
|
MHonArc
2.2.0