Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: RE: missing attribute values in classification trees


From: Tom Dinsmore
Date: Fri, 30 Oct 1998 10:14:26 -0500 (EST)
(1) The incidence of cases with missing data depends upon the source 
of the
data -- it can vary from 0 to 100%

(2) The model building problem presented by missing data does not 
depend so
much upon the proportion of missing data as its cause.  Missing data 
caused
by random data entry errors, for example, may not cause a problem 
even at
20% or 30% of the cases.  On the other hand, missing data caused by a
fundamental bias or instability in the data collection process will 
lead to
a biased or unstable model.  

=================
Thomas Dinsmore
Exchange Applications
One Lincoln Plaza
89 South Street
Boston MA 02111
617-737-2244 x556
tdinsmore@exapps.com


> -----Original Message-----
> From: Tjen-Sien Lim [SMTP:limt@stat.wisc.edu]
> Sent: Thursday, October 29, 1998 1:22 PM
> To:   datamine-l@nautilus-sys.com; decision_trees@egroups.com
> Subject:      DM: missing attribute values in classification trees
> 
> Hi, I'd like to get some advice from those of you who have analyzed
> datasets with missing attribute values in supervised learning.
> 
> What's the "typical" proportion of cases with missing values? How 
>big
> the proportion has to be before it presents problems?
> 
> We're conducting a project comparing classification trees 
>classifiers
> on datasets with missing values. We'd like to simulate
> missing-at-random on datasets that contain no missing values to
> increase the number of datasets. Should we simulate 5%, 10%, 20%, or
> 30% missing at random? Any preferred way to induce those missing
> values?
> 
> Thanks in advance for any advice/pointers/suggestions.
> 
> -- 
> Tjen-Sien Lim                (608) 262-8181                        
> Ph.D. candidate              limt@stat.wisc.edu                    
> Dept. of Statistics          http://www.stat.wisc.edu/~limt
> Univ. of Wisconsin-Madison
> 1210 West Dayton Street       
> Madison, WI 53706



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0