[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: RE: Data Forms for MiningFrom: Eric Bloedorn Date: Mon, 22 May 2000 13:17:37 -0400
Ken: I am curious - what commercial tools choke and die on tables wider than 700 chars?! -Eric Bloedorn, MITRE Corporation "Collier, Ken" wrote: > > This question hit a hot-button for me. While most OLAP and DSS technology > require a great deal of structure in thier data (e.g., start schema), data > mining tools expect the data to be denormalized into a single 2D table. > Furthermore, aside from market basket analysis, most data mining algorithms > assume that each observation in a data set represents a unique entity (e.g., > each record is a different customer). > > What this implies is that there is substantial data preprocessing required > in most cases to transform data from a relational, star, or other structured > model, into the mineable denormalized structure required. In our experience > with retailers, telecos, manufacturers, insurance companies, banks, and > others, this preprocessing generally consumes about 80% of the total effort > compared to the actual data mining, validation, verification, and > deployment, which consumes the remaining 20%. Your mileage may vary. > > Now, here's the rub: We recently had a manufacturing client with ~1000 > quality control parameters for each component within a single widget. In > this scenario a widget is made up of 2-6 major sub-widgets, and each > sub-widget is made up of 3 components. The same set of QC parameters is > collected on each component. So, even when we denormalize the data into a > single table, there can be as many as 18 (6 x 3) records for a single > widget. Our objective in this analysis was to identify root causes of widget > failure in order to reduce the defect rate. > > Now, we want the data mining algorithms to "see" all 18 records associated > w/ a single widget as a single "pattern". Unfortunately commercial tools > don't tune their algorithms to do this even though it is technically > possible. One exception is time series and sequence analysis algorithms, but > these methods are really intended for a different purpose. Another kludgy > solution to this problem is to string out all 18 records into a single WIDE > record per widget. Many commercial tools choke and die on tables that are > wider than 700 vars. > > We finally wound up solving this problem using SAS Enterprise Miner and SGI > Mineset, but not without a lot of data transformations, preprocessing, and > preliminary variable reduction. To my thinking, the next generation of data > mining tools should provide the flexibility to "see" data in a wide variety > of structures. The price we may pay for this flexibility is the speed of > data sourcing prior to analysis. > --- > Ken Collier > Senior Manager, Business Intelligence > KPMG Consulting > Corporate Sponsor of the Center for Data Insight http://insight.cse.nau.edu > > -----Original Message----- > From: greg.della-croce@marchfirst.com > [mailto:greg.della-croce@marchfirst.com] > Sent: Thursday, May 18, 2000 6:09 AM > To: datamine-l@nautilus-sys.com > Subject: DM: Data Forms for Mining > > I have worked in and around Data Warehouse/Marts with their star schema for > awhile now. However I am interested what form the data takes when it is > being > optimized for Mining. I am speaking to structured data, not unstructured > data > such as large bodies of text. What are the architectures of a Data Mining > DB? > Is the form dependent on the algorithms that you are going to employ > against it? > Or is it more general in nature? > > Thank you for your replies! > > Greg Della-Croce > marchFirst > BI/KM > > ***************************************************************************** > The information in this email is confidential and may be legally privileged. > It is intended solely for the addressee. Access to this email by anyone else > is unauthorized. > > If you are not the intended recipient, any disclosure, copying, distribution > or any action taken or omitted to be taken in reliance on it, is prohibited > and may be unlawful. When addressed to our clients any opinions or advice > contained in this email are subject to the terms and conditions expressed in > the governing KPMG client engagement letter. > *****************************************************************************
|
MHonArc
2.2.0