![]() |
|
![]() |
![]() |
|
![]() |
![]() |
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering and categorical attributesFrom: Ted Pedersen Date: Mon, 12 Jan 1998 09:30:40 -0500 (EST)
>
> Hello-
>
> I have a question about non-numeric data - what are good algorithms
>to
> use when doing cluster analysis with attributes that have many
> categories? For instance, I have a dataset in which several fields
>are
> alpha-numeric codes, and there are at least 1000 possible codes.
>
> I have read of a method that is based on k-means clustering. Are
>there
> others?
>
I have used McQuitty's Similarity analysis for non-numeric data.
The algorithm is described in:
@article{Mcquitty66,
author = {McQuitty, L.},
title = {Similarity Analysis by Reciprocal Pairs for Discrete
and
Continuous Data},
journal = {Educational and Psychological Measurement},
volume = {26},
year = {1966},
pages = {825--831}}
The algorithm isn't too hard to follow and there are some very
clear examples in the paper.
SAS supports this in PROC CLUSTER. And it's a fairly simple algorithm
and wouldn't be too hard to implement if need be.
Best of luck,
Ted
--
* Ted Pedersen pedersen@seas.smu.edu
*
* http://www.seas.smu.edu/~pedersen/
*
* Department of Computer Science and Engineering,
*
* Southern Methodist University, Dallas, TX 75275 (214) 768-3712
*
|
MHonArc 2.2.0