[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering and categorical attributes (fwd)From: Murray Jorgensen Date: Mon, 12 Jan 1998 18:20:08 -0500 (EST) At 17:34 12/01/98 -0500, Warren Sarle wrote: >Murray Jorgensen <maj@waikato.ac.nz> wrote: >> Surely k-means clustering requires numerical attributes? > >There is nothing inherently wrong with doing k-means on dummy 0|1 >variables generated from categorical attributes. K-means tries to >minimize sums of squared Euclidean distances, which can be expressed >as sums of simple matching coefficients when applied to categorical >data. But whether this is a good thing to do depends on the purpose >of the analysis and the nature of the data. > True, but when dealing with attributes with 1000 levels you would need to generate 999 dummy variables for each of these. I see things starting to get a little messy! Murray Jorgensen, Department of Statistics, U of Waikato, Hamilton, NZ -----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]----- "Doubt everything or believe everything:these are two equally convenient strategies. With either we dispense with the need to think." http://www.cs.waikato.ac.nz/stats/Staff/maj.html - Henri Poincare'
|
MHonArc
2.2.0