[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: MU CS Seminar, Minimum Message Length (Dr. David Dowe)From: David L Dowe Date: Tue, 14 Oct 1997 01:28:31 -0400 (EDT) > From dong@cs.mu.oz.au Tue Oct 14 10:22:19 1997 > Date: Tue, 14 Oct 97 10:05:39 +1200 > Subject: MU CS Seminar, Minimum Message Length (Dr. David Dowe) The University of Melbourne, Computer Science Seminar 2:15pm, Tuesday, 21 October 1997 Theatre 2, Basement, SEECS Building 221 Bouverie Street, Carlton Minimum Message Length theory and some machine learning applications Dr. David Dowe Dept of Computer Science Monash University dld@cs.monash.edu.au Abstract: Many statistical, machine learning and ``data mining'' techniques for modelling data are based on Maximum Likelihood (ML), which means that they typically suffer bias in small samples and, for more difficult problems, they can actually converge to the wrong answer (a problem known in statistics as inconsistency). Wallace's Minimum Message Length (MML) principle (1968) gives a universal objective function which is statistically consistent (i.e. always converges to the right answer) and invariant, and suffers at worst very little (or no) small sample bias. We give a brief derivation of the MML principle using Bayes's theorem. For data D, the MML theory is the hypothesis H which does all of the following equivalent things: (i) maximises the posterior probability of H, Pr(H|D) (ii) maximises the product of the prior and the likelihood, Pr(H) . Pr(D|H) (iii) minimises -log_2(Pr(H)) - log_2(Pr(D|H)) By elementary information theory, (iii) above is the length of a two-part message transmitting both the hypothesis and then the data in light of the hypothesis. The objective of MML is to find the theory, H, which does the three equivalent things above. We outline in some detail how MML is used for statistical parameter estimation and how it uses the Fisher information to determine the size of the uncertainty region for its parameter estimates. We then show the fairly pronounced empirical success of MML versus classical rivals in estimating the directions and concentration of (circular) angular data. In being universally applicable, MML offers a paradigm for comparing models of differing degrees of complexity, as arises in clustering and mixture modelling. We show how to derive a message length for mixture models, both for Normal distributions and for circular data. This is used in the Snob program : http://www.cs.monash.edu.au/~dld/Snob.html . We also discuss results from a probabilistic football prediction competition developed by the presenter and others at Monash in 1995 which has recently been receiving media coverage. Issues of optimally combining predictors are not only relevant to general prediction and forecasting, but are also relevant to (e.g.) DNA compression and the optimal combination of stocks in a market portfolio. The principles behind MML and Kolmogorov-Solomonoff complexity (Solomonoff, 1964; Kolmogorov, 1965; Chaitin, 1966) also tell us that markets are in general inefficient (although the act of finding arbitrage opportunities might be computationally very intensive). Depending upon time constraints and subject to audience preference, options at this point will include some or all of : an application of Snob to clustering of protein dihedral angles, MML inference of decision trees and (with disjunctions) decision graphs, MML inference of Probabilistic Finite State Automata (PFSAs), etc. Web pages at <URL:http://www.cs.mu.oz.au/info/seminars/seminars.html> Enquiries to Guozhu Dong, Seminar Coordinator dong@cs.mu.oz.au Phone: +61-3-9344-9101,-9166 <URL:http://www.cs.mu.oz.au/~dong> Fax: +61-3-9348-1184
|
MHonArc
2.2.0