[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: equal-size clusteringFrom: Warren Sarle Date: Thu, 4 Sep 1997 13:17:21 -0400 (EDT) David Dowe writes: > > From owner-datamine-l@nessie.crosslink.net Thu Sep 4 13:51:21 >1997 > > Date: Thu, 04 Sep 1997 11:05:09 +0800 > > From: Hukan <hukan@cs.hku.hk> > > To: datamine-l@nautilus-sys.com > > Subject: DM: equal-size clustering > > ... > > I have a special clustering problem. Given a set of points in >the > > multidimensional space, we want to cluster these points under the > > limition that the sizes of clusters are (almost) equal. Could >anyone > > give me some suggestions? > ... > I would do this by MML (Minimum Message Length), and would use Snob > http://www.cs.monash.edu.au/~dld/Snob.html > modified so that the relative class abundances had to be (almost) >equal, > and I would try to quantify "(almost) equal" with the best Bayesian >priors > I could. > > No doubt, others will come up with alternative suggestions. It is impossible to say what the best method of analysis is without knowing what the purpose of the analysis is. You can have a Bayesian prior that says that the population mixing probabilities are exactly equal, but that will not force the sample mixing proportions to be approximately equal. If the distribution is more or less uniform, the sample mixing proportions will be nearly equal, but if the population contains well-separated clusters with radically different mixing probabilities, the prior will have little effect. K-means and numerous similar methods implicitly assume that the population mixing probabilities are exactly equal. One popular method for forcing the sample mixing proportions to be more nearly equal is given in: Desieno, D. (1988), "Adding a conscience to competitive learning," Proc. Int. Conf. on Neural Networks, I, 117-124, IEEE Press. Rather than try to force a false model on the data, it might be better to transform the data to have an approximately uniform distribution. But, as I said, it is impossible to know whether this is appropriate without knowing the purpose of the analysis. -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. * Do not send me unsolicited commercial, political, or religious email *
|
MHonArc
2.2.0