Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re[2]: DM: Clustering and categorical attributes (fwd)


From: David Dutton
Date: Tue, 13 Jan 1998 04:30:42 -0500 (EST)
     Has anyone ever used the likes of COBWEB for symbolic 
'clustering' 
     with substantial amounts of data? I'd be interested in any 
comments. 
     Seem to remember there are later versions which cope with 
numeric data 
     too....
     
     BTW: In the UK you can 'generalise' postal codes by chopping off 
the 
     latter part to give a wider 'area' - dont know if US codes work 
     similarly.
     
     Dave
     
______________________________ Reply Separator 
_________________________________
Subject: Re: DM: Clustering and categorical attributes (fwd)
Author:  Murray Jorgensen <maj@waikato.ac.nz> at smtp
Date:    1/13/98 1:44 AM


Surely someone, somewhere has mapped US zip codes onto a two or three 
dimensional grid. (Maybe even someone in SAS Institute!) Then you 
could 
replace the zips with two or three numeric grid coordinates.
     
Murray
At 19:09 12/01/98 -0500, Warren Sarle wrote: 
>I wrote:
>> There is nothing inherently wrong with doing k-means on dummy 0|1 
>> variables generated from categorical attributes.
>
>Murray Jorgensen <maj@waikato.ac.nz> replied:
>> True, but when dealing with attributes with 1000 levels you would 
>need to 
>> generate 999 dummy variables for each of these. I see things 
>starting to 
>> get a little messy!
>
>What's really messy is when people want to use zip codes with 20,000 
>levels, many of which have only one or two cases. In such 
>situations, 
>I ask "Do you really want to do that?" and they say "Yes!" <sigh>
>
>But assuming there are adequate data to support the analysis, 
>generating a few thousand dummy variables isn't all that bad. 
>It's certainly less hassle than computing a distance matrix when 
>you have a million cases!
>
>-- 
>
>Warren S. Sarle       SAS Institute Inc.   The opinions expressed 
>here 
>saswss@unx.sas.com    SAS Campus Drive     are mine and not 
>necessarily 
>(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
>* Do not send me unsolicited commercial, political, or religious 
>email * 
>
>



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0