Continuation of HANDOUT on Binomial Approximation and Quality of Estimation in the Context of a Political Opinion Poll ============================================================ SAMPLE PROBLEM: for voting population N=10^7 in a certain state, suppose that D = number who prefer Bush to any likely Democrat is 52% of N, i.e., D = 5.2e6, and suppose we draw a random sample of 400 from the population. The question is: what is the probability that the poll results on the 400 people sampled gives exactly the wrong answer, i.e. what is the probability that the number X of the 400 who say they prefer Bush is less than 50% , i.e. less than 0.5*400 = 200 ? First step is to say that sampling WITH or WITHOUT replacement from sucha a large population (400 << 10^7) makes virtually no difference, so that the probability that X <= 199 which is exactly Hypergeometric(1.e7, 5.2e6, 400), is identical (up to high accuracy) to Binom(400, 0.52). UP TO THIS POINT, THIS IS CLOSE TO THE SAME PROBLEM CONSIDERED IN THE 10/22/03 HANDOUT ON NORMAL APPROXIMATION TO BINOMIAL PROBABILITIES. We found there that the probability that X <= 192 is actually around 0.06, either exactly or via the normal approximation. Now we can say also that the probability that X < 200 is roughly 0.197, which is uncomfortably large ! It is clear from this calculation that we are solving a problem about the precision of the statistical sampling-based estimate X/n = X/400 for the population parameter D/N = 0.52 . We will vary the sample-size n now and discuss how large it should be chosen for the estimator X/n achieve various levels of precision. The whole topic is based on treating X/n as a random variable, using X ~ Hypergeometric(10^7, 0.52*10^7, n) which is essentially the same as Binomial(n, 0.52), since when N is so much larger than n, sampling with or without replacement will almost certainly result in the same sample. So approximately X ~ Normal (n*0.52, n*0.52*0.48), which means that X/n ~ Normal(0.52, 0.2496/n). We quote the variability of this estimator by saying THE STANDARD ERROR OF X/n IS SQRT(.2496/n) Another way of giving this information is: P( | X/n - 0.52 | <= b* SQRT(.2496/n) ) = Phi(b) - Phi(-b) = 2*Phi(b)-1 This tells us: if we want to be able to say the probability is 1 - alpha or better that |X/n - 0.52 | <= b* sqrt(.2496/n) we choose b so that 2*Phi(b)-1 = 1-alpha, or: Phi(b) = 1-alpha/2. Here is a little table of values for the half-width b*sqrt(.2496/n) of a "Confidence Interval" for the true but unknown population proportion (0.52 here) around the estimated value X/n . alpha b n Interval half-width .02 2.326 100 .116 400 .058 1000 .037 .05 1.960 100 .098 400 .049 1000 .031 .10 1.645 100 .082 400 .041 1000 .026 NOTE that none of these opinion polls with such values of n have very definitive accuracy in declaring the majority view of voters !!