[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: RE: Problem of Sample sizeFrom: osborn Date: Thu, 6 Jul 2000 11:15:21 +1000 eeek@okclub.com WHOEVER he/she is, wrote: > what is proper sample size? > How could we say that this group is too small, and the other is large? > If there is proper size of sample for analysis, and the sample that > we have is too small, please, let me know best way of increasing > the sample size. It's hard to tell if this is a troll or a joke, or a serious question. It would be appropriate for all posters to identify themselves by name and affiliation in future... But to take this one seriously for a minute: (1) Experimental design is the collection of methods behind sample size and structure (as well as what to measure/query/etc, consistency checking, etc). This is a technical field - ie, you have to study it. (2) It is assumption driven, but some assumptions can be refined by pilot studies, reviewing prior studies, introducing domain specific constraints, etc. (3) It also depends on what you want to determine from your analysis and modelling, and how precisely you want to know it (mainly sensitivity and specificity). This has to be informed by the costs/utility of type 1 and type 2 errors, and risk of prediction errors. If other insights are desired (eg, structure in data sets), there are other issues to do with dimensionality and discriminating power (another, large topic). (4) For conventional hypothesis testing, or modelling using GLMs or OLS, experimental design is fairly straight forward (as the models and and assumptions are extensive of the population space). Using other kinds of modelling (non-parametric, neural, hierarchical, Bayes variants, etc), the issues get clouded as model components are dependent on each other. In my experience, if you use the underying design for GLMs, you will do better (more information extraction, more precision, less risk, etc) using the appropriate more elaborate method. This assumes you have some expertise with the more elaborate method. [This assumption may not be valid if the modelling is done by a "business analyst", "cowboy", or uses too many drugs]. Where modelling is done on fairly small data sets or high parameter models, model over-fitting is a big concern, with a large literature advising on options. Read the Neural Net FAQ (for starters). > I consider two possiblities. One is data duplication and the other is > inclusion of old data having a little different pattern that I think. I respectfully suggest that it's time to read a book. Tom. Dr Tom Osborn Director of Modelling The NTF Group Decision Support Consultants Level 7, 1 York Street SYDNEY NSW 2000 AUSTRALIA phone: +61 2 9252 0600 fax: +61 2 9251 9894
|
MHonArc
2.2.0