[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering and categorical attributesFrom: Ted Pedersen Date: Mon, 12 Jan 1998 09:30:40 -0500 (EST) > > Hello- > > I have a question about non-numeric data - what are good algorithms >to > use when doing cluster analysis with attributes that have many > categories? For instance, I have a dataset in which several fields >are > alpha-numeric codes, and there are at least 1000 possible codes. > > I have read of a method that is based on k-means clustering. Are >there > others? > I have used McQuitty's Similarity analysis for non-numeric data. The algorithm is described in: @article{Mcquitty66, author = {McQuitty, L.}, title = {Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data}, journal = {Educational and Psychological Measurement}, volume = {26}, year = {1966}, pages = {825--831}} The algorithm isn't too hard to follow and there are some very clear examples in the paper. SAS supports this in PROC CLUSTER. And it's a fairly simple algorithm and wouldn't be too hard to implement if need be. Best of luck, Ted -- * Ted Pedersen pedersen@seas.smu.edu * * http://www.seas.smu.edu/~pedersen/ * * Department of Computer Science and Engineering, * * Southern Methodist University, Dallas, TX 75275 (214) 768-3712 *
|
MHonArc
2.2.0