R Log to Compare Coverage for (One-sided) CI's for a Binomial Proportion p ================================================= 3/3/10 Suppose that X ~ Binom(n,p) is the observed number of successes in n Bernoulli(p) trials with unknown p. The three methods we compare are: the one based on the simplest or "naive" normal-approximation pivot (X-np)/sqrt(X*(1-X/n)); the one based on solving quadratic inqualities to find the interval based on the pivot (X-np)/sqrt(np(1-p)); and the "test-based method" described For the first two see Example 4.4.3 in Bickel-Doksum, specifically (4.4.7) and (4.4.3)-(4.4.4) on pp.237-8. For the "test-based method, see the discussion on pp.245-6. The erratic behavior of these CI's is discussed in a sophisticated way in the Brown, Cai, and Das Gupta (2001) paper referenced on p.238. -------- Note ---------- If B(k,n,p) denotes the binomial(n,p) cdf evaluated at k, and Bbar(k,n,p)=1-B(k-1,n,p) is the probability of k or more successes, then a straightforward differentiation shows that (d/dt)Bbar(k,n,t) = n*b(k-1,n-1,t) > 0, so that Bbar(k,n,t) = int_0^t dbeta(s,k,n-k+1) ds = pbeta(t,k,n-k+1) is obtained (for k>0) as a beta integral. ----------------------- Here are the computing formulas and computed results relating to the performance of our confidence intervals. (1) Normal approx -- naive p <= U1 = (X + qnorm(1-alpha)*sqrt(X*(1-X/n))/n (2) Normal approx -- solving quadratic inequality za2 = qnorm(1-alpha)^2 p <= U2 = (X+za2/2 + sqrt((1-X/n)*X*za2+za2^2))/(n+za2) (3) Test-based The upper confidence limit as given in Bickel and Doksum solves B(X,n,U) = alpha or Bbar(X+1,n,U)=1-alpha, which means that pbeta(U,X+1,n-X) = 1-alpha or U = qbeta(1-alpha,X+1,n-X). p <= U3 = qbeta(1-alpha,X+1,n-X) >>>>>>>>>> R coding follows >>>>>>>>>>>>>> BinUpCI = function(X,n,alpha) { U = numeric(3) U[1] = (X + qnorm(1-alpha)*sqrt(X*(1-X/n)))/n za2 = qnorm(1-alpha)^2 U[2] = (X+za2/2 + sqrt((1-X/n)*X*za2+za2^2/4))/(n+za2) U[3] = qbeta(1-alpha,X+1,n-X) U } > BinUpCI(37,120,.05) [1] 0.3776752 0.3812626 0.3849185 (CI upper bounds for 37 successes in 120 trials) Parr = array(0, c(100,4)) Parr[1:100,1] = 5:104 for(i in 5:104) Parr[i-4,2:4] = BinUpCI(i,110,.05) ### The next calculations check that the defining equations of the CI upper-limits have been solved correctly. > summary((Parr[,1]-110*Parr[,2])/sqrt(Parr[,1]*(1-Parr [,1]/110))) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.645 -1.645 -1.645 -1.645 -1.645 -1.645 > summary((Parr[,1]-110*Parr[,3])/sqrt(110*Parr[,3]*(1-Parr[,3]))) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.645 -1.645 -1.645 -1.645 -1.645 -1.645 > summary(pbinom(Parr[,1],110,Parr[,4])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.05 0.05 0.05 0.05 0.05 0.05 dimnames(Parr) = list(5:104,c("X","Norm.appr", "Norm.quad","Test.based")) ### Now display UCL's for comparison. Parr[seq(1,31,by=2),2:4] Here are the comparative X Norm.appr Norm.quad Test.based upper bounds in the 3 types 5 0.0781222 0.09043396 0.09319384 of CI's with n=110 and alpha=.05 7 0.1019194 0.11335598 0.11620209 for various values of X. 9 0.1248035 0.13549345 0.13842981 11 0.1470492 0.15706467 0.16009232 13 0.1688104 0.17819761 0.18131639 15 0.1901839 0.19897501 0.20218420 17 0.2112352 0.21945401 0.22275262 19 0.2320110 0.23967604 0.24306294 21 0.2525463 0.25967223 0.26314621 23 0.2728676 0.27946665 0.28302646 25 0.2929958 0.29907829 0.30272267 27 0.3129476 0.31852246 0.32225010 29 0.3327366 0.33781161 0.34162121 31 0.3523740 0.35695600 0.36084626 33 0.3718688 0.37596419 0.37993378 35 0.3912289 0.39484330 0.39889088 ### The differences across columns are not too large, but that is because n=110 is fairly large here. Now do similar exhibit for n=77. Parr2 = array(0, c(65,4)) Parr2[1:65,1] = 5:69 for(i in 5:69) Parr2[i-4,2:4] = BinUpCI(i,77,.05) dimnames(Parr2) = list(5:69,c("","Norm.appr", "Norm.quad","Test.based")) Parr2[seq(1,31,by=2),2:4] X Norm.appr Norm.quad Test.based 5 0.1111245 0.1274433 0.1316908 7 0.1447967 0.1595506 0.1639779 9 0.1771067 0.1904921 0.1951004 11 0.2084504 0.2205804 0.2253674 13 0.2390499 0.2499986 0.2549607 15 0.2690443 0.2788647 0.2839980 17 0.2985276 0.3072601 0.3125603 19 0.3275664 0.3352430 0.3407057 21 0.3562097 0.3628570 0.3684778 23 0.3844943 0.3901350 0.3959094 25 0.4124487 0.4171024 0.4230259 27 0.4400948 0.4437792 0.4498471 29 0.4674495 0.4701808 0.4763882 31 0.4945261 0.4963191 0.5026612 33 0.5213344 0.5222032 0.5286748 35 0.5478816 0.5478397 0.5544354 ## All of this shows that the naive normal approximation UCL is essentially the same as the test-based UCL in the first case (n=110) but not the 2nd (n=77). The three upper-limits are progressively larger. ## We next show for a sequence of sample sizes the coverage behavior for these CI's for various choices of true p. TruCovr = function(p, n, alpha) { Parr = array(0, c(n+1,3), dimnames=list(0:n, c("Norm.appr","Norm.quad","Test.based"))) for(i in 0:n) Parr[i+1,] = BinUpCI(i,n,alpha) CovProb = array(0, c(length(p),3)) for(j in 1: length(p)) { Qarr = t(p[j] <= Parr) CovProb[j,] = Qarr %*% dbinom(0:n,n,p[j]) } list(parr = Parr, cover = CovProb) } OutProbs77 = round(TruCovr(seq(.05,.44,by=.01),77,.05)$cover,4) > OutProbs77 Columns are coverage probabilities for level .95 1-sided CI's [,1] [,2] [,3] for respective values [1,] 0.9027 0.9807 0.9807 p=.05 [2,] 0.8479 0.9496 0.9915 .06 [3,] 0.9125 0.9746 0.9746 .07 [4,] 0.8731 0.9515 0.9515 ... [5,] 0.9242 0.9739 0.9739 [6,] 0.8944 0.9562 0.9562 [7,] 0.9355 0.9755 0.9755 [8,] 0.9123 0.9618 0.9618 [9,] 0.8875 0.9456 0.9780 [10,] 0.9274 0.9673 0.9673 [11,] 0.9075 0.9546 0.9546 [12,] 0.9401 0.9401 0.9724 [13,] 0.9240 0.9623 0.9623 [14,] 0.9066 0.9508 0.9508 [15,] 0.9378 0.9690 0.9690 [16,] 0.9236 0.9597 0.9597 [17,] 0.9083 0.9493 0.9746 [18,] 0.9378 0.9672 0.9672 [19,] 0.9252 0.9589 0.9589 [20,] 0.9116 0.9495 0.9735 [21,] 0.9392 0.9392 0.9668 [22,] 0.9280 0.9593 0.9593 [23,] 0.9159 0.9509 0.9509 [24,] 0.9417 0.9417 0.9674 [25,] 0.9317 0.9606 0.9606 [26,] 0.9209 0.9531 0.9531 [27,] 0.9449 0.9449 0.9686 [28,] 0.9360 0.9626 0.9626 [29,] 0.9264 0.9559 0.9559 [30,] 0.9487 0.9487 0.9704 [31,] 0.9408 0.9408 0.9651 [32,] 0.9323 0.9592 0.9592 [33,] 0.9527 0.9527 0.9527 [34,] 0.9458 0.9458 0.9679 [35,] 0.9382 0.9627 0.9627 [36,] 0.9301 0.9570 0.9570 [37,] 0.9509 0.9509 0.9509 [38,] 0.9443 0.9443 0.9663 [39,] 0.9371 0.9614 0.9614 p=.43 [40,] 0.9561 0.9561 0.9561 p=.44 NOTE that it very often happens that the quadratic-method and test-based method produce the same results; but in other cases the test-based method is clearly conservative (UCL too large). OutProbs100 = round(TruCovr(seq(.05,.44,by=.01),100,.1)$cover,4) > summary(OutProbs100) Norm.appr Norm.quad Test.based Min. :0.8201 Min. :0.8817 Min. :0.9005 1st Qu.:0.8715 1st Qu.:0.8942 1st Qu.:0.9094 Median :0.8820 Median :0.9076 Median :0.9176 Mean :0.8808 Mean :0.9055 Mean :0.9206 3rd Qu.:0.8960 3rd Qu.:0.9152 3rd Qu.:0.9290 Max. :0.9098 Max. :0.9434 Max. :0.9629 ### So according to these calculations, the coverage probability of the "test-based" approach is systematically too large (although often the same as the quadratic-inequality method) and the quadratic-inequality method is by far the best !!! But even for it, many of the coverage probabilities which should be 0.90 are off by .01 or more (in either direction) !! ## One last exhibit is the sequence of coverage probabilities for the same p for a succession of sample sizes: Covprobs = function(p, n, alpha) { Covers = array(0, c(length(n),3), dimnames=list(n, c("Norm.appr","Norm.quad","Test.based"))) for(i in 1:length(n)) Covers[i,] = TruCovr(p,n[i],alpha)$cover Covers } > Covprobs(.12, 45:54, .1) All of these coverage probabilities Norm.appr Norm.quad Test.based should be .90. Look at how they jump 45 0.8046628 0.9188976 0.9188976 around when p=.12 is fixed and 46 0.8183710 0.9259111 0.9259111 n ranges from 45 ... 54, 47 0.8312758 0.9323635 0.9323635 for alpha=.1. 48 0.8434063 0.9382939 0.9382939 49 0.8547928 0.9437397 0.9437397 50 0.8654664 0.8654664 0.9487358 51 0.8754588 0.8754588 0.9533156 52 0.8848016 0.8848016 0.9575104 53 0.8935266 0.8935266 0.9613494 54 0.7917925 0.9016654 0.9016654