[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
RE: AW: DM: RE: Data Forms for Mining (Limit on variables)From: osborn Date: Mon, 29 May 2000 11:31:41 +1000 H Mark Hubey wrote about VC dimension and necessary number of [training] exemplars, etc: > I don't get this. The two phrases "the largest number h of vectors > that can be separated into two different classes" and "in > all the 2^h possible ways" is not registering in my brain. How > can a collection be separated into 2 classes in N ways? 2^h is the total number of mappings (ie, all classifications) of type D -> {0,1} (where D is the domain). VC dimension is about the achievability of a modelling framework to capture all of these. Mark's example with {1,2,3,4,5,6} -> {0,1} has 2^6 possible mappings. [VC dimension is also about some other things, too, but let's not be distracted - in the DM forum, this part of VC dimension is what's being talked about]. PAC learning theory [Valiant's "Probably Approximately Correct" stuff] also addresses similar issues of learnability. These both presume that learning would use the "Sherlock Holmesian" notion of considering everything (all possible mappings) and eliminating all those which are inconsistent with an exemplar set. Ie, an unbiassed version space where all contending models live till killed by inconsistency. VC dimension is useful in addressing this (in a bounds sense) for particular model forms, while PAC is useful for addressing is by partitioning a space of {0,1}^N -> {0,1} into 'right' and 'wrong'. BOTH approaches are elegantly probabilistic... As noted by others, in practical situations, the mappings of interest are FAR less numerous than 2^h, since most interesting things in the real world have a regularity, aka, compact representation, aka, a lot of simple consistency with a set of exceptions, many of which can likewise be easily represented. [Hence my reference to Hints]. ...AND when the VC dimension of [generalised] linear forms is being lorded above the non-linear/neural/trees/whatever else you can invent, PLEASE remember that there's an awful lot of mappings which linear forms CAN'T achieve!!! Of course, there are mappings of interest [sic] which don't fall under this umbrella, such as "describing the order of the first K coupling of N drug influenced planerians worms at a (worm) sex orgy" - and maybe there's significant predictability in that situation, awaiting an eager young ethologist data miner to discover... ... :-) T. Dr Tom Osborn Director of Modelling NTF Decision Support Consultants Level 7, 1 York Street SYDNEY NSW 2000 AUSTRALIA phone: +61 2 9252 0600 fax: +61 2 9251 9894
|
MHonArc
2.2.0