RE: DM: RE: Data Forms for Mining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

RE: DM: RE: Data Forms for Mining

From: Khabaza, Tom
Date: Fri, 26 May 2000 14:53:07 +0100


Hi Ken,

Re:
 > I'm wondering if a data set like DTOX is as
 > varied in data types, formats, codes, quality problems, etc.
 > as a typical
 > business data set containing 5-10 years of historical, often manually
 > entered data. May not matter, but even when our tools have
 > successfully
 > handled more than 1000 features, it hasn't been simple.

Quite right of course.  The DTOX data was relatively clean
and homogenous.  If your problems were really questions of
data quality and consistency then I can see that wide datasets
would be a problem no matter what tools you are using.

 > In your filtering suggestion are you saying that you generate
 > multiple C5.0
 > results from the entire database using different parameter
 > settings, and
 > then use the filtering node to isolate key features? Sounds
 > like a variation
 > on bundling. I'd like to know more. We are doing a lot with bundling,
 > bagging, and boosting to improve our predictive accuracy.
 > However, as far as
 > I know SAS EM is the only tool that has built in this
 > capability. I'd love
 > to be able to combine models in Clementine. Is this possible?

The technique I'm referring to is where you generate a filter
(field selection) node from a C5.0 model.  If you have several
such models you can combine the field selections.
I'm not sure what "bundling" refers to.
(C5.0 contains boosting but not bagging, of course.)
Is that any help?

Is this a bit low-level for a public list?
Happy to tell you more of course - and if there are other people
on this list who'd like more details, let me know and I'm happy
to cut you in.

All the best,
tom
--
Tom Khabaza
Programme Manager, Data Mining
SPSS Services
+44 1483 719304
tomk@spss.com

Prev by Date: DM: RE: Data-mining companies based in London
Next by Date: DM: RE: Data Forms for Mining (Limit on variables)
Prev by thread: RE: DM: RE: Data Forms for Mining
Next by thread: Re: DM: RE: Data Forms for Mining
Index(es):
- Date
- Thread