[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: CHAID vs. CARTŪFrom: Ronny Kohavi Date: Wed, 24 Sep 1997 20:41:21 -0400 (EDT) Torpy> Hi all, I joined this list about a week ago, and I have to say Torpy> that I'm pretty impressed with the level of discussion here. Torpy> I'm a technical marketing specialist in the marketing Torpy> department at SPSS Inc. Lately, we've been talking about the Torpy> differences between CHAID algorithms and CARTŪ algorithms, and I Torpy> thought I'd see what you people think. Torpy> 1) What are the pros and cons of CHAID and CARTŪ ? Torpy> 2) What is your preference and why? Torpy> 3) Would you always use one over the other, or would you use Torpy> one in some situations and use the other in different Torpy> situations. Since the theory says that it's impossible for one algorithm to uniformly beat any other on generalization accuracy in classification tasks (where uniformly is for all possible target concepts), the question is *when* (under what conditions) one algorithm is better than another, not *whether*. Decision tree induction is too hard to analyze analytically (i.e., arrive at conditions for which it outperform other algorithms) so there are no hard rules for when to apply one algorithm versus another. While some techniques have proven to be useful in practice (e.g., bagging, boosting), the differences between CHAID and CARTŪ are relatively small compared to other algorithms with completely different hypotheses spaces (e.g., nearest-neighbors, bayesian classifiers). My common answer to the above question is as follows: Since in practice the customer usually has specific datasets, the performance on THESE datasets is what matters. Hence just run several algorithms and pick the one that has better test-set accuracy. This isn't perfect (and obviously the no-free-lunch theorem implies that doing this over too many algorithms can't always be successful), but it seems to work well in practice. Projects such as Statlog, and the study we did Kohavi, R., Sommerfield D., Dougherty J., Data Mining using MLC++, a Machine Learning Library in C++. Tools with AI '96. and which is available off http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html show that different decision-tree algorithms (C4.5, CARTŪ in the above study) do about the same on average. More important differences come up between different algorithms types. As an example, in the recent KDD-CUP, all decision-tree based products did lousy, while Naive-Bayes, a relatively simple algorithm based on conditional independence assumptions did well (two out of the top three used it): http://www.epsilon.com/KDDCUP/index.htm Much depends on the dataset at hand! Torpy> Please note that I'm not asking about specific products, only Torpy> the algorithms. Although, if you have any strong feelings Torpy> about certain products, I'd be interested in hearing those as Torpy> well. A point that many miss is that it's not just the algorithm itself but the presentation of results and the environment in which it's integrated. Visualizing the resulting model, for example, is very important. That's the great advantage of tools such as MineSet http://www.sgi.com/Products/software/MineSet and others (Angoss's Knowledge Seeker, IBM's Intelligent Miner, etc). -- Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk) Engineering Manager, Analytical Data Mining.
|
MHonArc
2.2.0