Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering and categorical attributes (fwd)


From: Murray Jorgensen
Date: Mon, 12 Jan 1998 18:20:08 -0500 (EST)
At 17:34 12/01/98 -0500, Warren Sarle wrote:
>Murray Jorgensen <maj@waikato.ac.nz> wrote:
>> Surely k-means clustering requires numerical attributes?
>
>There is nothing inherently wrong with doing k-means on dummy 0|1
>variables generated from categorical attributes. K-means tries to
>minimize sums of squared Euclidean distances, which can be expressed
>as sums of simple matching coefficients when applied to categorical
>data. But whether this is a good thing to do depends on the purpose
>of the analysis and the nature of the data.
>

True, but when dealing with attributes with 1000 levels you would 
need to
generate 999 dummy variables for each of these. I see things starting 
to
get a little messy!


Murray Jorgensen,  Department of Statistics,  U of Waikato, Hamilton, 
NZ
-----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]-----
"Doubt everything or believe everything:these are two equally 
convenient
strategies. With either we dispense with the need to think."
http://www.cs.waikato.ac.nz/stats/Staff/maj.html       - Henri 
Poincare'
    



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0