[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
RE: DM: RE: Data Forms for MiningFrom: Dan Steinberg Date: Tue, 30 May 2000 10:57:07 -0700 (PDT) On Fri, 26 May 2000, Collier, Ken wrote: > In your filtering suggestion are you saying that you generate multiple C5.0 > results from the entire database using different parameter settings, and > then use the filtering node to isolate key features? Sounds like a variation > on bundling. I'd like to know more. We are doing a lot with bundling, > bagging, and boosting to improve our predictive accuracy. > > Ken Collier > Senior Manager, Business Intelligence > KPMG Consulting One method we have used for variable selection over the past 5 years is to grow a large number of CART(R) trees using various settings on priors, splitting rule, and test methods, and then provisionally eliminating variables that have a zero importance in all trees grown. Zero importance means that the variable did not appear as either a primary splitter or a surrogate splitter at any node in any tree. Bootstrap resampling (done automatically under the bagging option) can generate quite a bit of variation in tree structure; so can changes in priors and costs. If a variable cannot play a useful role in any tree under a broad range of tree growing strategies there is little risk in eliminating it. This variable elimination method is easily automated via scripts and is quite effective in radically reducing the number of candidate predictors. *---------------------------+---------------------------------* | Dan Steinberg | FAX (619) 543 8888 | | Salford Systems | VOICE (619) 543-8880 | | 8880 Rio San Diego Dr | | | San Diego, CA 92108 | http://www.salford-systems.com | *-------------------------------------------------------------*
|
MHonArc
2.2.0