Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering and categorical attributes (fwd)


From: Murray Jorgensen
Date: Mon, 12 Jan 1998 20:07:19 -0500 (EST)
Surely someone, somewhere has mapped US zip codes onto a two or three
dimensional grid. (Maybe even someone in SAS Institute!) Then you 
could
replace the zips with two or three numeric grid coordinates.

Murray
At 19:09 12/01/98 -0500, Warren Sarle wrote:
>I wrote:
>> There is nothing inherently wrong with doing k-means on dummy 0|1
>> variables generated from categorical attributes.
>
>Murray Jorgensen <maj@waikato.ac.nz> replied:
>> True, but when dealing with attributes with 1000 levels you would 
>need to
>> generate 999 dummy variables for each of these. I see things 
>starting to
>> get a little messy!
>
>What's really messy is when people want to use zip codes with 20,000
>levels, many of which have only one or two cases. In such situations,
>I ask "Do you really want to do that?" and they say "Yes!" <sigh>
>
>But assuming there are adequate data to support the analysis,
>generating a few thousand dummy variables isn't all that bad.
>It's certainly less hassle than computing a distance matrix when
>you have a million cases!
>
>-- 
>
>Warren S. Sarle       SAS Institute Inc.   The opinions expressed 
>here
>saswss@unx.sas.com    SAS Campus Drive     are mine and not 
>necessarily
>(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
>* Do not send me unsolicited commercial, political, or religious 
>email *
>
>



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0