Re: DM: CHAID vs. CARTŽ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: CHAID vs. CARTŽ

From: Ismail Parsa
Date: Thu, 25 Sep 1997 23:32:39 -0400 (EDT)

|>  From: "Torpy, Edward" <etorpy@spss.com>
|>  Subject: DM: CHAID vs. CARTŽ 
|>  
|>  [...] talking about the differences between CHAID algorithms
|>  and CARTŽ  algorithms, and I thought I'd see what you people think.
|>  
|>  1) What are the pros and cons of CHAID and CARTŽ ?
|>  
|>  2) What is your preference and why?
|>  
|>  3) Would you always use one over the other, or would you use one 
|in
|>  some situations and use the other in  different situations.
|>  
|>  Please note that I'm not asking about specific products, only the
|>  algorithms.  Although, if you have any strong feelings about 
|certain
|>  products, I'd be interested in hearing those as well.

I agree with  Ronny's  comments  that the issue is not  whether one
algorithm  is better  than the other but what makes each  algorithm
suitable for the problem in hand.

There are 3 families of decision tree algorithms:  

  1) The CARTŽ  family (CARTŽ , IND CARTŽ , Splus CARTŽ , etc.) 
  2) The ML family (ID3, C4.5, C5 and other derivatives, etc.)
  3) The AID family (THAID, CHAID, XAID, TREEDISC, etc.)

Without going into details, one or more of the  following  could be
used to distinguish between the three families:

  o motivation behind the algorithm
  o splitting criteria 
  o stopping criteria
  o scale type of the dependent/criterion variable
  o scale type of the independent/input variables.

I am aware  of two  studies  published  in the  journal  of  direct
marketing comparing CHAID and CARTŽ :

  1) Direct Marketing Modeling with CARTŽ  and CHAID by Dominique 
     Houghton and Samer Oulabi: JDM, vol. 7, num. 3, summer 1993.

  2) CARTŽ : A Recent Advance in Tree-Structured List Segmentation 
     Methodology by Rosana Thrasher: JDM, vol. 5, num. 1, winter 
     1991.

The former  study  concludes  no  notable  differences  across  the
performances  of the two  algorithms  while the  latter  leans more
favorably  towards CARTŽ  due to the hard constraint on the number of
independent  variables  that  could be input in the CHAID  software
available at that time.  This is no longer an issue.

My experience  with the two algorithms  also suggest that there are
no notable  differences in the  performances of the two algorithms.
I  personally  do not  use  these  algorithms  for  any  predictive
modeling  tasks  (classification  and/or  prediction.)  As with all
recursive  partitioning  algorithms, they do not generalize as well
as other predictive  modeling  techniques that do not partition the
sample during learning.  (I'll admit that  bootstrapping  or k-fold
cross-validation help improve their results!)  The tree methods are
more suited for descriptive modeling tasks.

Regards.

*-----------------------------*
|        Ismail Parsa         |
|   Epsilon Data Management   |
|     50 Cambridge Street     |
|   Burlington MA 01803 USA   |
|                             |
| E-MAIL: iparsa@epsilon.com  |
| V-MAIL: (617) 273-0250*6734 |
|    FAX: (617) 272-8604      |
*-----------------------------*

Prev by Date: Re: DM: CHAID vs. CARTŽ
Next by Date: DM: KDD-CUP-97 delayed announcement
Prev by thread: Re: DM: CHAID vs. CARTŽ
Next by thread: DM: RE: Data Mining Patents
Index(es):
- Date
- Thread