Re: DM: Small data sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Small data sets

From: Warren Sarle
Date: Tue, 2 May 2000 12:15:57 -0400 (EDT)


 > From: knowledgeminer@iworld.to (Frank Lemke)
 > ...
 > "Commonly, a large data set is one that has many cases or records. With
 > this book, however, 'large' rather refers to the number of variables
 > describing each record. When there are more variables than cases, the most
 > known algorithms are running into some problems (in mathematical
 > statistics, for instance, covariance matrix becomes singular so that
 > inversion is impossible; Neural Networks fail to learn).

No, neither neural nets nor regression will fail to learn if they are
programmed correctly. The danger is that they will learn too well and
overfit. But as everyone should know by now, there are many ways to
control overfitting; e.g., see ftp://ftp.sas.com/oub/neural/FAQ3.html .
The most serious problem is that extrapolation outside the subspace
spanned by the training set may fail miserably.

--

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.

Prev by Date: DM: Subscribing and Unsubscribing
Next by Date: DM: Extended Deadline: ECAI 2000 WS on Balancing Reactivity and Social
Prev by thread: Re: DM: Small data sets
Next by thread: Re: DM: Small data sets
Index(es):
- Date
- Thread