Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Datamining Definition...and Machine Learning Definition.


From: Warren Sarle
Date: Thu, 23 Mar 2000 13:08:33 -0500 (EST)
> From: "Franklin Wayne Poley" <culturex@vcn.bc.ca>
> ...
> If we go with such a broad term then data mining/knowledge extraction
> becomes synonymous with machine learning does it not?

Unfortunately, data mining has become whatever the marketing people
trying to sell expensive software define it to be.

Machine learning is traditionally concerned with small, noise-free
data sets, primarily with categorical variables. In recent years
the ML people have shown more interest in noisy data and continuous
variables, but they still seem to view noisy data as an aberration.

Data mining is traditionally concerned with huge, noisy data sets
with all kinds of messy variables. Primarily, the purpose of data
mining is to create a predictive model for a specific target
variable (such as customer purchasing, credit card fraud, etc.) or
to see if there are any predictive relationships among a large
number of variables (e.g., "associations and sequences", market
basket analysis).  Predictive models for noisy data were called
"statistical models" before some marketing person came up with the
term "data mining", which, by the way, used to be a derogatory term
in the statistical literature.  Secondarily, data mining is
concerned with detecting outliers (anomalies, novelties), which is
another application of statistical models.  Ultimately, data mining
is used to make decisions--usually business decisons but perhaps
medical decisions or various other kinds of decisions. Making
decisions based on noisy data is the province of statistical
decision theory, which is used in the SAS Enterprise Miner product.
So I choose to define data mining as the application of statistical
decision theory to huge, messy data sets to maximize profits.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0