[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: FYI: preliminary results of missing values comparisonFrom: David R. Lovell Date: Thu, 5 Nov 1998 19:19:03 -0500 (EST)
Tjen-Sien Lim wrote: > > The following is the preliminary ranking of methods based on 16 > datasets. Details for each dataset is available at > http://www.stat.wisc.edu/~limt/mv.html . Hi, I'm writing because I think this "league table" is potentially very misleading as presented in your email. First, it is not clear how you have arrived at the mean misclassification rates. Did you average the misclassification rates of each classifier across experiments? If so, then that ignores the increased variance of performance estimates in experiments with fewer data points. Thus, performance on smaller data sets is given equal weight to performance on larger data sets. Second, you give no indication of the variance of your estimates. Third, there is considerable uncertainty associated with league tables. Minor perturbations in the data or the algorithms can completely reshuffle the rankings. See Goldstein and Spiegelhalter for more on this. Fourth, averaging the predictions of multiple models is well known to improve classification performance. Is it meaningful to compare single classifiers with boosted ensembles? Surely it would be more fair to compare individual classifiers and ensemble methods seperately? Fifth, it is not clear whether all methods implement the same class of decision functions. I know that CARTŪ implements axis aligned splits, for example, but am not sure about many of the other methods. The point is this: certain types of decision functions suit some data better than others. If all methods listed implement the same kind of decision function, the ranking tells us how effectively each algorithm goes about fitting that function. If not, it becomes unclear whether you are ranking the fitting algorithm OR ranking how well different decision functions fit the data. Sixth, it is unclear what you are trying to establish with this table. Maybe I've got the wrong impression, in which case, ignore this email. I think the ranking aims to suggest which classifier might be best to use, in general. No doubt, if this was the intent, certain specious results will make certain marketing departments jolly pleased with themselves. I think comparisons of classifier performances can be very useful. I also think it's an extremely difficult exercise and recommend Cohen's text as a starting point to address this issue. Ciao, David Goldstein, H. and Spiegelhalter, D. J. (1996). Statistical aspects of institutional performance: league tables and their limitations (with discussion). Journal of the Royal Statistical Society, Series A, 159, 385-444. P. R. Cohen (1995). Empirical Methods for Artificial Intelligence. MIT Press. -- David R. Lovell Analysis of Large and Complex Datasets CSIRO, Locked Bag 17, North Ryde, NSW 1670, Australia. Phone: +61 2 9325 3217. Fax: +61 2 9325 3200. Email: David.Lovell@cmis.csiro.au
|
MHonArc
2.2.0