DM: effect of minimal sample size in a node (classification trees)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: effect of minimal sample size in a node (classification trees)

From: Tjen-Sien Lim
Date: Thu, 14 Oct 1999 11:28:11 -0400 (EDT)

I've found a weird phenomenon with Shelby Haberman's breast cancer
survival data (can be downloaded from the UCI Machine Learning
Repository). The QUEST algorithm yields only the root node (number of
terminal nodes = 1) when I set the minimum sample size at 1. However,
when I change the minimum sample size to 5, I get a tree with 18
terminal nodes. Both trees are obtained with N-fold cross-validation
(or jackknife), 0-SE rules, proportional priors, and equal costs.

On the other hand, CART(r), C5.0/See5, and my experimental
classification tree all produce the same tree with 3 terminal nodes
regardless of the minimum sample size.

Has anyone observed the same problem with other tree variants or with
other data sets? If minimum sample size has a big impact, then
interpretation of the tree diagram would be much more difficult.

Thanks.

-- 
Tjen-Sien Lim                (608) 262-8181 (Voice)
Dept. of Statistics          (209) 882-7914 (Fax)
Univ. of Wisconsin-Madison   limt@stat.wisc.edu
1210 West Dayton Street      http://www.stat.wisc.edu/~limt
Madison, WI 53706

Prev by Date: DM: PolyAnalyst 4.0 from Megaputer: final beta-testing program for the
Next by Date: Re: DM: Fuzzy Trees
Prev by thread: DM: European Conference of AI, Call for Workshop Proposals
Next by thread: DM: PolyAnalyst 4.0 from Megaputer: final beta-testing program for the
Index(es):
- Date
- Thread