Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering algorithm for high-dimensional Boolean space


From: Murray Jorgensen
Date: Sun, 7 Dec 1997 17:47:32 -0500 (EST)
Thanks to David Dowe for the mention. I have not tried Multimix with
data like this, but it might be feasible if you have a lot of memory 
and
choose your array bounds with care. In a completely categorical data 
set
like this Multimix reduces to Latent Class Analysis, and you may have
access to other software for doing this.  Some LCA references:

81JRSS-A  144     419- 461 J
Statistical modelling of data on teaching styles (with discussion)
Aitkin, Murray;Anderson, Dorothy;Hinde, John
Cluster analysis;Latent class analysis;EM algorithm


84StatMed   3     249- 259 J
A classification of Scottish infants using latent class analysis
Pickering, R. M.;Forbes, J. F.
EM algorithm


90StatMed   9     559- 572 J
Latent class analysis of diagnostic agreement
Uebersax, John S.;Grove, William M.


92JASA     87     476- 486 J
Linear logistic latent class analysis for polytomous data
Formann, Anton K.
Maximum likelihood

92Psymtrka 57     261- 269 J
The EM algorithm for latent class analysis with equality constraints
Mooijaart, Ab;van der Heijden, Peter G. M.

A link to Multimix can be found in my home page (see .sig below)

Murray Jorgensen

PS I recommend examining solutions with small numbers of clusters 
first,
k=2,3,4,...
At 17:41 5/12/97 +1100, David L Dowe wrote:
>> From owner-datamine-l@nessie.crosslink.net Fri Dec  5 08:58:55 1997
>> From: "Rao, Bharat" <bharat@scr.siemens.com>
>> To: datamine-l@nautilus-sys.com
>> Subject: DM: Clustering algorithm for high-dimensional Boolean 
>space
>> Date: Thu, 4 Dec 1997 15:25:06 -0500
>> 
>> Hello,
>
>   Bharat et al, Hi.
>
>
>> 
>> I'm looking to cluster a dataset where the
>> a) data has high-dimensionality (50<n<1000)
>> b) relatively few samples ( M=O(n), and occasionally M < n)
>> c) and is completely Boolean (all variables are 0/1).
>> 
>>      [Obviously clustering will be hard, and quite possibly
>>       I will end up with a bunch of singleton clusters.  But
>>       I'd like to try running some existing algorithms on this
>>       data, at least for benchmarking purposes, before trying
>>       to develop new algorithms.]
>> 
>> Can anyone point me to some existing implemented algorithms that
>> cluster Boolean data.  (I have already requested a copy of COBWEB
>> from Doug Fisher, and realize that AutoClass is not suited for 
>Boolean
>> data.)
>
>Snob (using MML, by Chris Wallace and me)
>http://www.cs.monash.edu.au/~dld/Snob.html
>deals with boolean data and should have no problems with the above.
>
>You might also want to look at Hunt and Jorgensen's MULTIMIX.
>
>Snob WWW page link and MULTIMIX link are below.
>
>
>> 
>> Also, any pointers to work on constructive induction that may be
>> relevant
>> for constructing new features to help clustering would be 
>appreciated.
>> 
>> Thanks for any help,
>> 
>> Bharat
>
>(Dr.) David Dowe, Dept of Computer Science, Monash University, 
>Clayton,
>Victoria 3168, Australia  dld@cs.monash.edu.au     Fax:+61 3 
>9905-5146
>http://www.cs.monash.edu.au/~dld/
>http://www.cs.monash.edu.au/~dld/Snob.html
>http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 
>
>
Dr Murray Jorgensen    
http://www.cs.waikato.ac.nz/stats/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton,  New 
Zealand
*Applications Editor, Australian and New Zealand Journal of 
Statistics*
maj@waikato.ac.nz Phone +64-7 838 4773 home phone 856 6705 Fax 838 
4666



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0