Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Small data sets


From: Warren Sarle
Date: Tue, 2 May 2000 12:15:57 -0400 (EDT)

 > From: knowledgeminer@iworld.to (Frank Lemke)
 > ...
 > "Commonly, a large data set is one that has many cases or records. With
 > this book, however, 'large' rather refers to the number of variables
 > describing each record. When there are more variables than cases, the most
 > known algorithms are running into some problems (in mathematical
 > statistics, for instance, covariance matrix becomes singular so that
 > inversion is impossible; Neural Networks fail to learn).

No, neither neural nets nor regression will fail to learn if they are
programmed correctly. The danger is that they will learn too well and
overfit. But as everyone should know by now, there are many ways to
control overfitting; e.g., see ftp://ftp.sas.com/oub/neural/FAQ3.html .
The most serious problem is that extrapolation outside the subspace
spanned by the training set may fail miserably.

--

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0