RE: AW: DM: RE: Data Forms for Mining (Limit on variables)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

RE: AW: DM: RE: Data Forms for Mining (Limit on variables)

From: osborn
Date: Thu, 25 May 2000 12:26:08 +1000


 > I am new to this. What is "VC-dimension"?

Vapnik-Chervonenkis Dimension. Eg, see "The Nature of Statistical
Learning Theory" by VN Vapnik, or "Statistical Learning Theory"
by Vapnik. Fairly heavy read...

" The VC dimension of a set of indicator functions Q(z,a), a in L,
is equal to the largest number h of vectors z1..zl that can be
separated into two different classes in all the 2^h possible ways
using this set of functions. " The notion here is being able to
"shatter" vectors into two sets in all possible way. Further
theorems determine the associated risk of misclassification
(or model complexity vs necessary _minimum_ data require
to build a model given a particular VC-dimension for a given
probability of misclassification).

In a practical situation there are issues of excluding classes of
models (weaker approximation), and being able to discriminate
between different classifications on the input data set (weaker
generalisation).

As other pointed out, in practical situations there are usually
a few (50?) variables which can do most of the work of building
a model that a client will find useful. The ART is identifying the
50. [And it's not always the same 50 for the whole of the input
space].

If you know something about the model (hints), the situation
is changed. One thing you can "know" is that the functions
(of many input variables) should be fairly simple...


T.

Dr Tom Osborn
Director of Modelling
NTF
Decision Support Consultants
Level 7, 1 York Street
SYDNEY NSW 2000
AUSTRALIA
phone:	+61 2 9252 0600
fax:	+61 2 9251 9894

Prev by Date: RE: DM: RE: Data Forms for Mining (Limit on variables)
Next by Date: DM: Classification problem
Prev by thread: Re: AW: DM: RE: Data Forms for Mining (Limit on variables)
Next by thread: Re: AW: DM: RE: Data Forms for Mining (Limit on variables)
Index(es):
- Date
- Thread