[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: Re: problem of sample sizeFrom: Mac Johnstone Date: Mon, 28 Aug 2000 17:32:34 -0700 Dear Vinnie: There was quite a bit of discussion on this forum in May concerning a similar problem. Here is a method I have used successfully using misclassification costs (see message from Earl S. Harris). Get the ratio of A to B. In your case this is 9,500,000/200,000 = 47.5. Round up to 48. Now set classification costs such that it costs 48 to classify a B as an A and 1 to classify an A as a B. Of course, it costs 0 to classify an A as an A and a B as a B. The method above works with a random sample from the population which contains representative numbers of both A's and B's. As far as selecting a sample size, I recommend the procedure in the book Data Preparation For Data Mining by Dorian Pyle. Regards, Mac ----- Original Message ----- From: vinnie <ejan@otech.co.kr> To: DM MailingList <datamine-l@nautilus-sys.com> Sent: Friday, August 25, 2000 7:39 PM Subject: DM: problem of sample size > Though It is a sort of traditional question, I wonder your method to = > deal with this kind of problem. > > The population size is about 9,500,000 (as record). There are two = > groups, A and B. > But Unfortunately, the size of A is 9,300,000 and that of B is 200,000.=20 > > Of course, The size of B is sufficiently enough to make sample or = > analyze, But we have to balance the size of two groups. what is = > appropriate sample size for two groups, What kind of sampling methods = > could be applied?=20 > > This problem is similar to the case of 1 bad guy and 99 good guys of 100 = > guys. > >
|
MHonArc
2.2.0