Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

AW: AW: DM: RE: Data Forms for Mining (Limit on variables)


From: Frank Buckler
Date: Fri, 26 May 2000 08:50:22 +0200

The vc-dimension tells how many example a specific learning algorithm can
learn/represent. In order to achieve ANY generalisation you need more
examples than the vc-dimension. The vc-dimension is determined by the number
of inputs. For neural networks it is only possible to approximate the
vc-dim. But for linear methods e.g. linear regression it is quite simple
Dvc= input+1. If a linear algo needs to represent any linear relationship
with 7000 vars appr. More than 10 000, you need for non-linear modelling
MUCH more than 100 000 examples. The number of required examples depends
primarily on the nonlinearities of the problem, not on the number of var's.
The vc-dimension is only bound, which show conventionally that some
case-studies have to produce noise or non- sense.
All this shows the crucial task of statistical or manual based preselection
and processing.

All this is an result of the falsification principle in science theory. More
on this in Vapnik, V, 1995, "nature of statistical learning theory" or try
one of the papers related to Support-Vector-Mashines in
http://svm.first.gmd.de

Frank

Frank Buckler ------------------------------------------------------------
University of Hanover
Dep. Marketing II
buckler@m2.uni-hannover.de



-----Ursprüngliche Nachricht-----
Von: owner-datamine-l@nautilus-sys.com
[mailto:owner-datamine-l@nautilus-sys.com]Im Auftrag von H. Mark Hubey
Gesendet: Mittwoch, 24. Mai 2000 21:42
An: datamine-l@nautilus-sys.com
Betreff: Re: AW: DM: RE: Data Forms for Mining (Limit on variables)

Frank Buckler wrote:
  >
  > Issue: Number of Inputs
  >
  > I'm surprised to hear that some guy's are using from thousand up to
million
  > of inputs.
  >
  > There exist an upper bound on input-number determined by sample size.
This
  > is advocated due to the VC-Dimension.

I am new to this. What is "VC-dimension"?

--
Mark Hubey, Professor
Department of Computer Science
Montclair State
University
http://www.csam.montclair.edu/~hubey




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0