HANDOUT FOR STAT 400: NORMAL APPROXIMATION TO BINOMIAL
======================================================
SAMPLE PROBLEM: for voting population N=10^7 in a certain state,
suppose that D = number who prefer Bush to any likely Democrat is 52%
of N, i.e., D = 5.2e6, and suppose we draw a random sample of 400 from
the population. The question is: what is the probability that the poll
results on the 400 people sampled gives exactly the wrong answer,
i.e. what is the probability that the number X of the 400 who say they
prefer Bush is no more than 48% , i.e. no more than 0.48*400 = 192 ?
First step is to say that sampling WITH or WITHOUT replacement from
sucha a large population (400 << 10^7) makes virtually no difference,
so that the probability that X <= 192 which is exactly
Hypergeometric(1.e7, 5.2e6, 400), is identical (up to high accuracy)
to Binom(400, 0.52).
The exact Binomial probability that X <= 192 is B(192,400,0.52) =
0.06048. So it is not large but is is not negligible either !! The
corresponding Normal approximation is:
Phi((192-400*0.52)/sqrt(400*0.52*0.48)) = 0.0547
The continuity-corrected Normal approximation is
Phi((192.5-400*0.52)/sqrt(400*0.52*0.48)) = 0.0604
So you can see why the continuity-correction is preferred !
THE NEWS MEDIA WOULD (SOMETIMES, WHEN CAREFUL) REPORT THE RESULT OF
THE POLL BY SAYING THAT A PROPORTION X/400 FAVORED BUSH, AND THAT
THIS RESULT HAS A "STANDARD ERROR" EQUAL TO PLUS OR MINUS 2.5%
(= STANDARD DEVIATION OF THE RANDOM VARIABLE X/400 =
sqrt( (1/400)^2 * 400*0.52*0.48 ) = 0.025 .
===================================================================
HERE ARE OTHER EXAMPLES OF THE APPROXIMATION & CONTINUITY CORRECTION:
k n p B(k,n,p) Phi((k-np/sqrt(npq)) Continuity
corrected
----------------------------------------------------------------------
66 200 0.3 0.8421 0.8227 0.8421
45 100 0.4 0.8689 0.8463 0.8692
22 150 0.1 0.9744 0.9716 0.9794
59 100 0.5 0.9716 0.9641 0.9712
42 100 0.5 0.0666 0.0548 0.0668