[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re[2]: DM: Clustering and categorical attributes (fwd)From: David Dutton Date: Tue, 13 Jan 1998 04:30:42 -0500 (EST) Has anyone ever used the likes of COBWEB for symbolic 'clustering' with substantial amounts of data? I'd be interested in any comments. Seem to remember there are later versions which cope with numeric data too.... BTW: In the UK you can 'generalise' postal codes by chopping off the latter part to give a wider 'area' - dont know if US codes work similarly. Dave ______________________________ Reply Separator _________________________________ Subject: Re: DM: Clustering and categorical attributes (fwd) Author: Murray Jorgensen <maj@waikato.ac.nz> at smtp Date: 1/13/98 1:44 AM Surely someone, somewhere has mapped US zip codes onto a two or three dimensional grid. (Maybe even someone in SAS Institute!) Then you could replace the zips with two or three numeric grid coordinates. Murray At 19:09 12/01/98 -0500, Warren Sarle wrote: >I wrote: >> There is nothing inherently wrong with doing k-means on dummy 0|1 >> variables generated from categorical attributes. > >Murray Jorgensen <maj@waikato.ac.nz> replied: >> True, but when dealing with attributes with 1000 levels you would >need to >> generate 999 dummy variables for each of these. I see things >starting to >> get a little messy! > >What's really messy is when people want to use zip codes with 20,000 >levels, many of which have only one or two cases. In such >situations, >I ask "Do you really want to do that?" and they say "Yes!" <sigh> > >But assuming there are adequate data to support the analysis, >generating a few thousand dummy variables isn't all that bad. >It's certainly less hassle than computing a distance matrix when >you have a million cases! > >-- > >Warren S. Sarle SAS Institute Inc. The opinions expressed >here >saswss@unx.sas.com SAS Campus Drive are mine and not >necessarily >(919) 677-8000 Cary, NC 27513, USA those of SAS Institute. >* Do not send me unsolicited commercial, political, or religious >email * > >
|
MHonArc
2.2.0