[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: Small data setsFrom: Frank Lemke Date: Fri, 28 Apr 2000 03:59:24 +0200 > Hi! > > I've been doing some research on Data Mining and have come into the = > twilight zone: why is everybody talking only about "large" databases? = > What about "small" databases - don't they have anything valuable inside? = > Don't they hide nuggets, useful patterns? > > And, nobody (best to my knowledge) has come up with a definition of = > "small" and "large" - not in terms of bits and bytes, but something more = > persistent to the change. Good question. From my experience, data mining is getting more and more assigned to database marketing, customer behavior modeling, web mining. Here, one has - naturally - to deal with large numbers of records. But "large" can also be defined this: "Commonly, a large data set is one that has many cases or records. With this book, however, 'large' rather refers to the number of variables describing each record. When there are more variables than cases, the most known algorithms are running into some problems (in mathematical statistics, for instance, covariance matrix becomes singular so that inversion is impossible; Neural Networks fail to learn). Even if the data are well-behaved, a large number of variables means that the data are distributed in a high dimensional hypercube, causing the known dimensionality problem." (Mueller/Lemke, Self-Organising Data Mining, ISBN 3-89811-861-4) Often it is more difficult to extract useful knowledge from 'small' data sets. For many economical (world model e.g.), ecological (global warming, water/ air pollution), medical/ bio-chemical (diagnosis of diseases, carcinogenicity prediction of aromatic compounds), a.o. problems are only rather short data sets available. Extracting knowledge from short and noisy data is the primary application area of self-organising data mining technologies. On all the mentioned problems the KnowledgeMiner software has been using successfully. Other examples are included in the downloadable demo (http://www.knowledgeminer.net). Frank
|
MHonArc
2.2.0