Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: FYI: preliminary results of missing values comparison


From: David R. Lovell
Date: Thu, 5 Nov 1998 19:19:03 -0500 (EST)
  • Organization: CSIRO Division of Mathematics and Statistics

Tjen-Sien Lim wrote:
> 
> The following is the preliminary ranking of methods based on 16
> datasets. Details for each dataset is available at
> http://www.stat.wisc.edu/~limt/mv.html .

Hi,
I'm writing because I think this "league table" is potentially
very misleading as presented in your email.

First, it is not clear how you have arrived at the mean
misclassification rates. Did you average the misclassification
rates of each classifier across experiments? If so, then that
ignores the increased variance of performance estimates in
experiments with fewer data points. Thus, performance
on smaller data sets is given equal weight to performance on
larger data sets.

Second, you give no indication of the variance of your estimates.

Third, there is considerable uncertainty associated with league 
tables.
Minor perturbations in the data or the algorithms can completely
reshuffle the rankings. See Goldstein and Spiegelhalter for more
on this.

Fourth, averaging the predictions of multiple models is well
known to improve classification performance. Is it meaningful
to compare single classifiers with boosted ensembles? Surely
it would be more fair to compare individual classifiers and
ensemble methods seperately?

Fifth, it is not clear whether all methods implement the
same class of decision functions. I know that CARTŪ  implements
axis aligned splits, for example, but am not sure about many
of the other methods. The point is this: certain types of
decision functions suit some data better than others.
If all methods listed implement the same kind of decision function,
the ranking tells us how effectively each algorithm goes about
fitting that function. If not, it becomes unclear whether you are
ranking the fitting algorithm OR ranking how well different decision
functions fit the data.

Sixth, it is unclear what you are trying to establish with
this table. Maybe I've got the wrong impression, in which case,
ignore this email. I think the ranking aims to suggest which
classifier might be best to use, in general. No doubt, if
this was the intent, certain specious results will make certain
marketing departments jolly pleased with themselves.

I think comparisons of classifier performances can be very
useful. I also think it's an extremely difficult exercise and
recommend Cohen's text as a starting point to address this issue.

Ciao,
David

Goldstein, H. and Spiegelhalter, D. J. (1996). Statistical aspects of
institutional performance: league tables and their limitations
(with discussion). Journal of the Royal Statistical Society, Series A,
159, 385-444. 

P. R. Cohen (1995). Empirical Methods for Artificial Intelligence.
MIT Press.

-- 
David R. Lovell
Analysis of Large and Complex Datasets
CSIRO, Locked Bag 17, North Ryde, NSW 1670, Australia.
Phone: +61 2 9325 3217. Fax: +61 2 9325 3200.
Email: David.Lovell@cmis.csiro.au



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0