RE: DM: RE: Data Forms for Mining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

RE: DM: RE: Data Forms for Mining

From: Dan Steinberg
Date: Tue, 30 May 2000 10:57:07 -0700 (PDT)

On Fri, 26 May 2000, Collier, Ken wrote:

 > In your filtering suggestion are you saying that you generate multiple C5.0
 > results from the entire database using different parameter settings, and
 > then use the filtering node to isolate key features? Sounds like a variation
 > on bundling. I'd like to know more. We are doing a lot with bundling,
 > bagging, and boosting to improve our predictive accuracy.
 >
 > Ken Collier
 > Senior Manager, Business Intelligence
 > KPMG Consulting

One method we have used for variable selection over the past 5 years is to grow
a large number of CART(R) trees using various settings on priors, splitting
rule, and test methods, and then provisionally eliminating variables that 
have a
zero importance in all trees grown.  Zero importance means that the 
variable did
not appear as either a primary splitter or a surrogate splitter at any node in
any tree.  Bootstrap resampling (done automatically under the bagging
option) can generate quite a bit of variation in tree structure; so can changes
in priors and costs.  If a variable cannot play a useful role in any tree under
a broad range of tree growing strategies there is little risk in eliminating
it.  This variable elimination method is easily automated via scripts and is
quite effective in radically reducing the number of candidate predictors.

  *---------------------------+---------------------------------*
  | Dan Steinberg             | FAX (619) 543 8888              |
  | Salford Systems           | VOICE (619) 543-8880            |
  | 8880 Rio San Diego Dr     |                                 |
  | San Diego, CA 92108       | http://www.salford-systems.com  |
  *-------------------------------------------------------------*

Prev by Date: Re: DM: RE: Data Forms for Mining
Next by Date: Re:DM: Anyone doing applied Ontology Discovery via DM?
Prev by thread: Re: DM: RE: Data Forms for Mining
Next by thread: RE: DM: RE: Data Forms for Mining
Index(es):
- Date
- Thread