[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: CHAID vs. CARTŪFrom: Ismail Parsa Date: Thu, 25 Sep 1997 23:32:39 -0400 (EDT) |> From: "Torpy, Edward" <etorpy@spss.com> |> Subject: DM: CHAID vs. CARTŪ |> |> [...] talking about the differences between CHAID algorithms |> and CARTŪ algorithms, and I thought I'd see what you people think. |> |> 1) What are the pros and cons of CHAID and CARTŪ ? |> |> 2) What is your preference and why? |> |> 3) Would you always use one over the other, or would you use one |in |> some situations and use the other in different situations. |> |> Please note that I'm not asking about specific products, only the |> algorithms. Although, if you have any strong feelings about |certain |> products, I'd be interested in hearing those as well. I agree with Ronny's comments that the issue is not whether one algorithm is better than the other but what makes each algorithm suitable for the problem in hand. There are 3 families of decision tree algorithms: 1) The CARTŪ family (CARTŪ , IND CARTŪ , Splus CARTŪ , etc.) 2) The ML family (ID3, C4.5, C5 and other derivatives, etc.) 3) The AID family (THAID, CHAID, XAID, TREEDISC, etc.) Without going into details, one or more of the following could be used to distinguish between the three families: o motivation behind the algorithm o splitting criteria o stopping criteria o scale type of the dependent/criterion variable o scale type of the independent/input variables. I am aware of two studies published in the journal of direct marketing comparing CHAID and CARTŪ : 1) Direct Marketing Modeling with CARTŪ and CHAID by Dominique Houghton and Samer Oulabi: JDM, vol. 7, num. 3, summer 1993. 2) CARTŪ : A Recent Advance in Tree-Structured List Segmentation Methodology by Rosana Thrasher: JDM, vol. 5, num. 1, winter 1991. The former study concludes no notable differences across the performances of the two algorithms while the latter leans more favorably towards CARTŪ due to the hard constraint on the number of independent variables that could be input in the CHAID software available at that time. This is no longer an issue. My experience with the two algorithms also suggest that there are no notable differences in the performances of the two algorithms. I personally do not use these algorithms for any predictive modeling tasks (classification and/or prediction.) As with all recursive partitioning algorithms, they do not generalize as well as other predictive modeling techniques that do not partition the sample during learning. (I'll admit that bootstrapping or k-fold cross-validation help improve their results!) The tree methods are more suited for descriptive modeling tasks. Regards. *-----------------------------* | Ismail Parsa | | Epsilon Data Management | | 50 Cambridge Street | | Burlington MA 01803 USA | | | | E-MAIL: iparsa@epsilon.com | | V-MAIL: (617) 273-0250*6734 | | FAX: (617) 272-8604 | *-----------------------------*
|
MHonArc
2.2.0