[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
RE: AW: DM: RE: Data Forms for Mining (Limit on variables)From: osborn Date: Thu, 25 May 2000 12:26:08 +1000 > I am new to this. What is "VC-dimension"? Vapnik-Chervonenkis Dimension. Eg, see "The Nature of Statistical Learning Theory" by VN Vapnik, or "Statistical Learning Theory" by Vapnik. Fairly heavy read... " The VC dimension of a set of indicator functions Q(z,a), a in L, is equal to the largest number h of vectors z1..zl that can be separated into two different classes in all the 2^h possible ways using this set of functions. " The notion here is being able to "shatter" vectors into two sets in all possible way. Further theorems determine the associated risk of misclassification (or model complexity vs necessary _minimum_ data require to build a model given a particular VC-dimension for a given probability of misclassification). In a practical situation there are issues of excluding classes of models (weaker approximation), and being able to discriminate between different classifications on the input data set (weaker generalisation). As other pointed out, in practical situations there are usually a few (50?) variables which can do most of the work of building a model that a client will find useful. The ART is identifying the 50. [And it's not always the same 50 for the whole of the input space]. If you know something about the model (hints), the situation is changed. One thing you can "know" is that the functions (of many input variables) should be fairly simple... T. Dr Tom Osborn Director of Modelling NTF Decision Support Consultants Level 7, 1 York Street SYDNEY NSW 2000 AUSTRALIA phone: +61 2 9252 0600 fax: +61 2 9251 9894
|
MHonArc
2.2.0