[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
RE: DM: RE: Data Forms for MiningFrom: Khabaza, Tom Date: Fri, 26 May 2000 14:53:07 +0100 Hi Ken, Re: > I'm wondering if a data set like DTOX is as > varied in data types, formats, codes, quality problems, etc. > as a typical > business data set containing 5-10 years of historical, often manually > entered data. May not matter, but even when our tools have > successfully > handled more than 1000 features, it hasn't been simple. Quite right of course. The DTOX data was relatively clean and homogenous. If your problems were really questions of data quality and consistency then I can see that wide datasets would be a problem no matter what tools you are using. > In your filtering suggestion are you saying that you generate > multiple C5.0 > results from the entire database using different parameter > settings, and > then use the filtering node to isolate key features? Sounds > like a variation > on bundling. I'd like to know more. We are doing a lot with bundling, > bagging, and boosting to improve our predictive accuracy. > However, as far as > I know SAS EM is the only tool that has built in this > capability. I'd love > to be able to combine models in Clementine. Is this possible? The technique I'm referring to is where you generate a filter (field selection) node from a C5.0 model. If you have several such models you can combine the field selections. I'm not sure what "bundling" refers to. (C5.0 contains boosting but not bagging, of course.) Is that any help? Is this a bit low-level for a public list? Happy to tell you more of course - and if there are other people on this list who'd like more details, let me know and I'm happy to cut you in. All the best, tom -- Tom Khabaza Programme Manager, Data Mining SPSS Services +44 1483 719304 tomk@spss.com
|
MHonArc
2.2.0