Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: effect of minimal sample size in a node (classification trees)


From: Tjen-Sien Lim
Date: Thu, 14 Oct 1999 11:28:11 -0400 (EDT)
I've found a weird phenomenon with Shelby Haberman's breast cancer
survival data (can be downloaded from the UCI Machine Learning
Repository). The QUEST algorithm yields only the root node (number of
terminal nodes = 1) when I set the minimum sample size at 1. However,
when I change the minimum sample size to 5, I get a tree with 18
terminal nodes. Both trees are obtained with N-fold cross-validation
(or jackknife), 0-SE rules, proportional priors, and equal costs.

On the other hand, CART(r), C5.0/See5, and my experimental
classification tree all produce the same tree with 3 terminal nodes
regardless of the minimum sample size.

Has anyone observed the same problem with other tree variants or with
other data sets? If minimum sample size has a big impact, then
interpretation of the tree diagram would be much more difficult.

Thanks.

-- 
Tjen-Sien Lim                (608) 262-8181 (Voice)
Dept. of Statistics          (209) 882-7914 (Fax)
Univ. of Wisconsin-Madison   limt@stat.wisc.edu
1210 West Dayton Street      http://www.stat.wisc.edu/~limt
Madison, WI 53706




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0