[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: Multimix clustering program (free!)From: Murray Jorgensen Date: Thu, 13 Nov 1997 21:13:21 -0500 (EST) Multimix was written by Lyn Hunt to fit mixture models to multivariate data sets as an alternative to other approaches to cluster analysis (unsupervised learning). Lyn developed this program as part of her doctoral research under my supervision. She now has a faculty position here at Waikato. Lyn and I are pleased to announce that Multimix can now be downloaded from ftp://ftp.math.waikato.ac.nz/pub/maj/ . Multimix generalises two common types of models that are finite mixtures of distributions: mixtures of multivariate normals, and latent class models. In the case of multivariate normals it is possible to specify a block diagonal covariance structure to reduce the number of parameters that need to be estimated. Details are given in the paper talk.ps (or talk.dvi) which may also be downloaded from the above ftp site. We have decided to make the Fortran 77 source code available so that you will be able to customise Multimix to your own data and platform. The sizes of the multidimensioned arrays used in Multimix are governed by parameter statements which may need to be changed from the supplied values to suit your needs. For those who are not accustomed to a statistical modelling approach I should make clear that in specifying the model it is important to keep the number of estimated parameters as low as possible consistant with a good fit to the data. Unlike some other approaches Multimix does not attempt to determine an optimal number of clusters. We recommend that you first explore solutions with 2, 3, 4, ... clusters before attempting to go any further. (I say this because when I requested information about array parameter settings it emerged from several emails that several respondants were seeking what we would regard as quite a large number of clusters.) Before attempting to fit your own data we recommend that you try to reproduce the output for the Cancer example data and model supplied. The file README.TXT describes the files available in this distribution and I will paste it into this email below as well. Read the paper TALK.DVI/TALK.PS before getting started, then read NOTES.DVI or NOTES.PS for some program documentation. Happy mixture modelling! Multimix.for contains the program code for fitting a finite mixture of K groups to the data. [Missing.for] contains a version of Multimix.for which can handle missing values in the variables. [Currently unavailable while minor changes are being made.] Talk.dvi Dvi and Postscript versions of a paper presented Talk.ps on 23 August 1996 to the conference ISIS96, Information, Statistics and Induction in Science, held in Melbourne, Australia.[Published in the proceedings of the Conference, edited by D. L. Dowe, K. B. Korb and J. J. Oliver, World Scientific: Singapore] Notes.ps is a postscript file giving information about the input required to run Multimix. Please read this file. Read3.for contains program code for setting up a parameter input file for program Multimix. This is useful when setting up the first few runs with a data set. Later it is easier to modify existing files with a text editor. Flexi This subdirectory contains a Bayesian smoothing program written by Martin Upsdell. It is not connected with Multimix in any way. Read about Flexi in Flexi/Info.txt. Martin's email address is upsdellm@agresearch.cri.nz. EXAMPLE OF DATA FILE, INPUT FILE, AND OUTPUT FILES Cancer11.dat contains the cancer data file. Cancerdesc.txt A description of the data in Cancer11.dat. 2band.dat contains a parameter input file for the cancer data. A two-component mixture model is to be fitted. The variables are partitioned into blocks. Each block or 'cell' is assumed independent of the others within each component. In the model fitted by 2band.dat the distributions of the variables in each block are 1 Univariate Normal 2 3-category Discrete 3 2-category Discrete 4 Trivariate Normal 5 7-category Discrete 6 Univariate Normal 7 Univariate Normal 8 Univariate Normal 9 Univariate Normal 10 2-category Discrete There is some re-ordering of variables to make the variables in each block contiguous. An initial grouping of the observations into two clusters is specified. Alternatively initial parameter values could have been given. General.out is the output file generated when using the parameter file 2band.dat. Groups.out contains the group assignment and the posterior probabilities of assignment to the two groups when using the parameter file 2band.dat. Queries to Murray Jorgensen <maj@waikato.ac.nz>. Murray Jorgensen, Department of Statistics, U of Waikato, Hamilton, NZ -----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]----- Doubt everything or believe everything: these are two equally convenient strategies. With either we dispense with the need to think. - Henri Poincare'
|
MHonArc
2.2.0