HANDOUT FOR STAT 400: NORMAL APPROXIMATION TO BINOMIAL ====================================================== SAMPLE PROBLEM: for voting population N=10^7 in a certain state, suppose that D = number who prefer Bush to any likely Democrat is 52% of N, i.e., D = 5.2e6, and suppose we draw a random sample of 400 from the population. The question is: what is the probability that the poll results on the 400 people sampled gives exactly the wrong answer, i.e. what is the probability that the number X of the 400 who say they prefer Bush is no more than 48% , i.e. no more than 0.48*400 = 192 ? First step is to say that sampling WITH or WITHOUT replacement from sucha a large population (400 << 10^7) makes virtually no difference, so that the probability that X <= 192 which is exactly Hypergeometric(1.e7, 5.2e6, 400), is identical (up to high accuracy) to Binom(400, 0.52). The exact Binomial probability that X <= 192 is B(192,400,0.52) = 0.06048. So it is not large but is is not negligible either !! The corresponding Normal approximation is: Phi((192-400*0.52)/sqrt(400*0.52*0.48)) = 0.0547 The continuity-corrected Normal approximation is Phi((192.5-400*0.52)/sqrt(400*0.52*0.48)) = 0.0604 So you can see why the continuity-correction is preferred ! THE NEWS MEDIA WOULD (SOMETIMES, WHEN CAREFUL) REPORT THE RESULT OF THE POLL BY SAYING THAT A PROPORTION X/400 FAVORED BUSH, AND THAT THIS RESULT HAS A "STANDARD ERROR" EQUAL TO PLUS OR MINUS 2.5% (= STANDARD DEVIATION OF THE RANDOM VARIABLE X/400 = sqrt( (1/400)^2 * 400*0.52*0.48 ) = 0.025 . =================================================================== HERE ARE OTHER EXAMPLES OF THE APPROXIMATION & CONTINUITY CORRECTION: k n p B(k,n,p) Phi((k-np/sqrt(npq)) Continuity corrected ---------------------------------------------------------------------- 66 200 0.3 0.8421 0.8227 0.8421 45 100 0.4 0.8689 0.8463 0.8692 22 150 0.1 0.9744 0.9716 0.9794 59 100 0.5 0.9716 0.9641 0.9712 42 100 0.5 0.0666 0.0548 0.0668