[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: effect of minimal sample size in a node (classification trees)From: Tjen-Sien Lim Date: Thu, 14 Oct 1999 11:28:11 -0400 (EDT) I've found a weird phenomenon with Shelby Haberman's breast cancer survival data (can be downloaded from the UCI Machine Learning Repository). The QUEST algorithm yields only the root node (number of terminal nodes = 1) when I set the minimum sample size at 1. However, when I change the minimum sample size to 5, I get a tree with 18 terminal nodes. Both trees are obtained with N-fold cross-validation (or jackknife), 0-SE rules, proportional priors, and equal costs. On the other hand, CART(r), C5.0/See5, and my experimental classification tree all produce the same tree with 3 terminal nodes regardless of the minimum sample size. Has anyone observed the same problem with other tree variants or with other data sets? If minimum sample size has a big impact, then interpretation of the tree diagram would be much more difficult. Thanks. -- Tjen-Sien Lim (608) 262-8181 (Voice) Dept. of Statistics (209) 882-7914 (Fax) Univ. of Wisconsin-Madison limt@stat.wisc.edu 1210 West Dayton Street http://www.stat.wisc.edu/~limt Madison, WI 53706
|
MHonArc
2.2.0