Re(2): DM: Small data sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re(2): DM: Small data sets

From: Frank Lemke
Date: Wed, 17 May 2000 20:13:24 +0200


 >> From: knowledgeminer@iworld.to (Frank Lemke)
 > > ...
 > > "Commonly, a large data set is one that has many cases or records. With
 > > this book, however, 'large' rather refers to the number of variables
 > > describing each record. When there are more variables than cases, the
most
 > > known algorithms are running into some problems (in mathematical
 > > statistics, for instance, covariance matrix becomes singular so that
 > > inversion is impossible; Neural Networks fail to learn).

 > From: saswss@unx.sas.com (Warren S. Sarle)
 >
 >No, neither neural nets nor regression will fail to learn if they are
 >programmed correctly. The danger is that they will learn too well and
 >overfit. But as everyone should know by now, there are many ways to
 >control overfitting; e.g., see ftp://ftp.sas.com/oub/neural/FAQ3.html .

thanks for the above link info.

in the theory of self-organizing modeling overfitting is avoided
systematically using an external criterion minimum as a function of model
complexity and noise dispersion. as noise increases, complexity of the
optimal model decreases. this is a self-controlled process that does not
depend from data set length directly. from systems theory it is known that
noise is represented to a system by the inputs and by external
disturbances including all true input variables that are not used for
modeling.
therefore, using an increasing number of potential input variables may
decrease noise in the data.

also, what about modeling dynamic systems ? (both time series and
multi-input/single-/multi-output systems)

Frank Lemke

 >--------------------------------------------------------------
 >[Software]
 >  KnowledgeMiner
 >    self-organizing modeling   prediction  classification
 >    GMDH NN   Fuzzy Rule Induction   Analog Complexing
 >[Book]
 >  Self-Organizing Data Mining.
 >    ISBN 3-89811-861-4
 >[Internet]
 >    http://www.knowledgeminer.net
 >    frank@knowledgeminer.net             knowledgeminer@iworld.to
 >--------------------------------------------------------------

Prev by Date: DM: Datasets for testing webmining algorithms
Next by Date: DM: KDD-2000 Call for Corporate Sponsors
Prev by thread: DM: KDD-2000 Call for Corporate Sponsors
Next by thread: DM: Datasets for testing webmining algorithms
Index(es):
- Date
- Thread