[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Clustering algorithm for high-dimensional Boolean spaceFrom: Murray Jorgensen Date: Sun, 7 Dec 1997 17:47:32 -0500 (EST) Thanks to David Dowe for the mention. I have not tried Multimix with data like this, but it might be feasible if you have a lot of memory and choose your array bounds with care. In a completely categorical data set like this Multimix reduces to Latent Class Analysis, and you may have access to other software for doing this. Some LCA references: 81JRSS-A 144 419- 461 J Statistical modelling of data on teaching styles (with discussion) Aitkin, Murray;Anderson, Dorothy;Hinde, John Cluster analysis;Latent class analysis;EM algorithm 84StatMed 3 249- 259 J A classification of Scottish infants using latent class analysis Pickering, R. M.;Forbes, J. F. EM algorithm 90StatMed 9 559- 572 J Latent class analysis of diagnostic agreement Uebersax, John S.;Grove, William M. 92JASA 87 476- 486 J Linear logistic latent class analysis for polytomous data Formann, Anton K. Maximum likelihood 92Psymtrka 57 261- 269 J The EM algorithm for latent class analysis with equality constraints Mooijaart, Ab;van der Heijden, Peter G. M. A link to Multimix can be found in my home page (see .sig below) Murray Jorgensen PS I recommend examining solutions with small numbers of clusters first, k=2,3,4,... At 17:41 5/12/97 +1100, David L Dowe wrote: >> From owner-datamine-l@nessie.crosslink.net Fri Dec 5 08:58:55 1997 >> From: "Rao, Bharat" <bharat@scr.siemens.com> >> To: datamine-l@nautilus-sys.com >> Subject: DM: Clustering algorithm for high-dimensional Boolean >space >> Date: Thu, 4 Dec 1997 15:25:06 -0500 >> >> Hello, > > Bharat et al, Hi. > > >> >> I'm looking to cluster a dataset where the >> a) data has high-dimensionality (50<n<1000) >> b) relatively few samples ( M=O(n), and occasionally M < n) >> c) and is completely Boolean (all variables are 0/1). >> >> [Obviously clustering will be hard, and quite possibly >> I will end up with a bunch of singleton clusters. But >> I'd like to try running some existing algorithms on this >> data, at least for benchmarking purposes, before trying >> to develop new algorithms.] >> >> Can anyone point me to some existing implemented algorithms that >> cluster Boolean data. (I have already requested a copy of COBWEB >> from Doug Fisher, and realize that AutoClass is not suited for >Boolean >> data.) > >Snob (using MML, by Chris Wallace and me) >http://www.cs.monash.edu.au/~dld/Snob.html >deals with boolean data and should have no problems with the above. > >You might also want to look at Hunt and Jorgensen's MULTIMIX. > >Snob WWW page link and MULTIMIX link are below. > > >> >> Also, any pointers to work on constructive induction that may be >> relevant >> for constructing new features to help clustering would be >appreciated. >> >> Thanks for any help, >> >> Bharat > >(Dr.) David Dowe, Dept of Computer Science, Monash University, >Clayton, >Victoria 3168, Australia dld@cs.monash.edu.au Fax:+61 3 >9905-5146 >http://www.cs.monash.edu.au/~dld/ >http://www.cs.monash.edu.au/~dld/Snob.html >http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html > > Dr Murray Jorgensen http://www.cs.waikato.ac.nz/stats/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand *Applications Editor, Australian and New Zealand Journal of Statistics* maj@waikato.ac.nz Phone +64-7 838 4773 home phone 856 6705 Fax 838 4666
|
MHonArc
2.2.0