[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: RE: missing attribute values in classification treesFrom: Tom Dinsmore Date: Fri, 30 Oct 1998 10:14:26 -0500 (EST) (1) The incidence of cases with missing data depends upon the source of the data -- it can vary from 0 to 100% (2) The model building problem presented by missing data does not depend so much upon the proportion of missing data as its cause. Missing data caused by random data entry errors, for example, may not cause a problem even at 20% or 30% of the cases. On the other hand, missing data caused by a fundamental bias or instability in the data collection process will lead to a biased or unstable model. ================= Thomas Dinsmore Exchange Applications One Lincoln Plaza 89 South Street Boston MA 02111 617-737-2244 x556 tdinsmore@exapps.com > -----Original Message----- > From: Tjen-Sien Lim [SMTP:limt@stat.wisc.edu] > Sent: Thursday, October 29, 1998 1:22 PM > To: datamine-l@nautilus-sys.com; decision_trees@egroups.com > Subject: DM: missing attribute values in classification trees > > Hi, I'd like to get some advice from those of you who have analyzed > datasets with missing attribute values in supervised learning. > > What's the "typical" proportion of cases with missing values? How >big > the proportion has to be before it presents problems? > > We're conducting a project comparing classification trees >classifiers > on datasets with missing values. We'd like to simulate > missing-at-random on datasets that contain no missing values to > increase the number of datasets. Should we simulate 5%, 10%, 20%, or > 30% missing at random? Any preferred way to induce those missing > values? > > Thanks in advance for any advice/pointers/suggestions. > > -- > Tjen-Sien Lim (608) 262-8181 > Ph.D. candidate limt@stat.wisc.edu > Dept. of Statistics http://www.stat.wisc.edu/~limt > Univ. of Wisconsin-Madison > 1210 West Dayton Street > Madison, WI 53706
|
MHonArc
2.2.0