Log Related to Score and Profile Likelihood Confidence Intervals ================================================== 3/3/03 ff === This log concerns the Score and Profile-likelihood confidence intervals for association parameters in 2x2 tables. The parameters theta , expressed in terms of the row-binomial success probabilities p1, p2 (with corresponding row-totals n1, n2), are: Risk Difference = p1-p2 Risk-Ratio = p1/p2 (also called relative risk) Odds-Ratio = (p1(1-p2))/((1-p1)p2) We usually think of the sampling framework as being indepedent binomial within rows, and this is the assumption made in the following Splus code, but in fact the same asymptotic justifications of the resulting Confidence Intervals apply when the sampling is multinomial, with only the table-total n nonrandom. A particular example we use for illustration is the Sweidsh Aspirin/MI study which had the 2x2 table data > tabnums [,1] [,2] [1,] 28 656 Aspirin [2,] 18 658 No aspirin MI no MI In this Log, we contrast 3 confidence intervals: (i) The Wald interval, based directly on approximate-normal sampling distribution for the MLE of theta. (ii) The Score interval, obtained by inverting the Score hypothesis test for the parameter to have a specified point value. (iii) The Profile Likelihood based interval, obtained by inverting the Wilks Generalized Likelihood Ratio test. The Splus (version 3.4) functions for calculating these intervals, along with the restricted MLE's for fixed values of theta, are: NuisBinD NuisBinRR NuisBinOR wlistings of which are given at the end of this Log. The Splus objects themselves are available on MathNet at /usr/local/StatData/stat770/.Data In addition, the function ScorProfW to pull these CI calculations together for any of the three parameter choices is available (and listed below. We now illustrate on the Aspirin/MI data above. First, for the Difference parameter: > round(NuisBinD(tabnums,.01),4) ## Stat values for theta=.01 rmle Scorstat LRstat logLik 0.0385 0.442 0.195 -199.923 Now we list these outputs for enough values in order to see where the Test-based CI's lie. 95% Interval endpoints for Score are the theta values for which the second column has values 1.96 and -1.96, and for Profile are theta values where the third column crosses the level 3.84. > for(i in 0:25) cat(round(NuisBinD(tabnums,-.01 + i*.002),4), "for theta=",-.01 + i*.002,"\n") 0.0305 2.4221 6.0742 -199.923 for theta= -0.01 0.0311 2.2358 5.1432 -199.923 for theta= -0.008 0.0317 2.0462 4.2825 -199.923 for theta= -0.006 0.0323 1.8534 3.4947 -199.923 for theta= -0.004 0.0331 1.6577 2.7823 -199.923 for theta= -0.002 0.0338 1.4594 2.1474 -199.923 for theta= 0 0.0346 1.2589 1.592 -199.923 for theta= 0.002 0.0355 1.0564 1.1179 -199.923 for theta= 0.004 0.0364 0.8524 0.7264 -199.923 for theta= 0.006 0.0374 0.6474 0.4186 -199.923 for theta= 0.008 0.0385 0.442 0.195 -199.923 for theta= 0.01 0.0396 0.2365 0.0559 -199.923 for theta= 0.012 0.0408 0.0315 0.001 -199.923 for theta= 0.014 0.042 -0.1725 0.0298 -199.923 for theta= 0.016 0.0432 -0.3751 0.1413 -199.923 for theta= 0.018 0.0446 -0.5758 0.3341 -199.923 for theta= 0.02 0.0459 -0.7745 0.6068 -199.923 for theta= 0.022 0.0473 -0.9707 0.9574 -199.923 for theta= 0.024 0.0488 -1.1643 1.3839 -199.923 for theta= 0.026 0.0503 -1.355 1.8842 -199.923 for theta= 0.028 0.0518 -1.5428 2.456 -199.923 for theta= 0.03 0.0534 -1.7275 3.0969 -199.923 for theta= 0.032 0.055 -1.909 3.8046 -199.923 for theta= 0.034 0.0566 -2.0875 4.5768 -199.923 for theta= 0.036 0.0583 -2.2628 5.411 -199.923 for theta= 0.038 0.06 -2.435 6.305 -199.923 for theta= 0.04 > for(i in 0:25) cat(round(NuisBinRR(tabnums, .85+i*.08),4), "for theta=",.85+i*.08,"\n") 0.0311 2.0255 4.0979 -199.923 for theta= 0.85 0.0326 1.7107 2.9408 -199.923 for theta= 0.93 0.034 1.4251 2.0483 -199.923 for theta= 1.01 0.0353 1.1636 1.3679 -199.923 for theta= 1.09 0.0365 0.9221 0.8594 -199.923 for theta= 1.17 0.0376 0.6978 0.4917 -199.923 for theta= 1.25 0.0386 0.4881 0.2401 -199.923 for theta= 1.33 0.0395 0.2911 0.0852 -199.923 for theta= 1.41 0.0404 0.1053 0.0111 -199.923 for theta= 1.49 0.0413 -0.0707 0.005 -199.923 for theta= 1.57 0.042 -0.2379 0.0563 -199.923 for theta= 1.65 0.0428 -0.3974 0.1563 -199.923 for theta= 1.73 0.0435 -0.5498 0.2976 -199.923 for theta= 1.81 0.0441 -0.6958 0.4743 -199.923 for theta= 1.89 0.0447 -0.8361 0.6812 -199.923 for theta= 1.97 0.0453 -0.9711 0.9138 -199.923 for theta= 2.05 0.0459 -1.1013 1.1686 -199.923 for theta= 2.13 0.0464 -1.227 1.4423 -199.923 for theta= 2.21 0.0469 -1.3487 1.7322 -199.923 for theta= 2.29 0.0473 -1.4665 2.0359 -199.923 for theta= 2.37 0.0478 -1.5809 2.3515 -199.923 for theta= 2.45 0.0482 -1.692 2.6771 -199.923 for theta= 2.53 0.0486 -1.8 3.0113 -199.923 for theta= 2.61 0.049 -1.9052 3.3528 -199.923 for theta= 2.69 0.0494 -2.0077 3.7002 -199.923 for theta= 2.77 0.0497 -2.1077 4.0527 -199.923 for theta= 2.85 > for(i in 0:25) cat(round(NuisBinOR(tabnums, .85+i*.08),4), "for theta=",.85+i*.08,"\n") 0.0312 2.007 4.0218 -199.923 for theta= 0.85 0.0326 1.7025 2.9119 -199.923 for theta= 0.93 0.034 1.4262 2.0516 -199.923 for theta= 1.01 0.0352 1.1734 1.3914 -199.923 for theta= 1.09 0.0364 0.94 0.8933 -199.923 for theta= 1.17 0.0374 0.7231 0.5283 -199.923 for theta= 1.25 0.0384 0.5203 0.2731 -199.923 for theta= 1.33 0.0394 0.3299 0.1095 -199.923 for theta= 1.41 0.0402 0.1501 0.0226 -199.923 for theta= 1.49 0.041 -0.0202 0.0004 -199.923 for theta= 1.57 0.0418 -0.1821 0.033 -199.923 for theta= 1.65 0.0425 -0.3365 0.1122 -199.923 for theta= 1.73 0.0432 -0.4841 0.2311 -199.923 for theta= 1.81 0.0438 -0.6257 0.3841 -199.923 for theta= 1.89 0.0445 -0.7617 0.5663 -199.923 for theta= 1.97 0.045 -0.8926 0.7735 -199.923 for theta= 2.05 0.0456 -1.019 1.0024 -199.923 for theta= 2.13 0.0461 -1.141 1.2498 -199.923 for theta= 2.21 0.0466 -1.2592 1.5132 -199.923 for theta= 2.29 0.0471 -1.3737 1.7903 -199.923 for theta= 2.37 0.0475 -1.4849 2.0794 -199.923 for theta= 2.45 0.048 -1.5929 2.3785 -199.923 for theta= 2.53 0.0484 -1.698 2.6864 -199.923 for theta= 2.61 0.0488 -1.8004 3.0017 -199.923 for theta= 2.69 0.0491 -1.9002 3.3234 -199.923 for theta= 2.77 0.0495 -1.9975 3.6503 -199.923 for theta= 2.85 ================================================================= To see the intervals themselves, for alpha=.05, .01: > lapply(ScorProfW(tabnums,alpha=c(.05,.1), thetarange=c(-.015,.05), npts=30)[2:4], function(lelt) round(lelt,5)) $WaldInts: [1,] -0.00487 0.03349 95% interval in "Difference" case [2,] -0.00179 0.03040 90% $ScorInts: [1,] -0.00510 0.03457 95% [2,] -0.00187 0.03110 90% $LRInts: [1,] -0.00490 0.0341 95% [2,] -0.00177 0.0308 90% > lapply(ScorProfW(tabnums,alpha=c(.05,.1), thetarange=c(.5,4), npts=30, type="RR")[2:4], function(lelt) round(lelt,5)) $WaldInts: [1,] 0.85467 2.84850 95% "RR" = risk ratio [2,] 0.94151 2.58576 90% $ScorInts: [1,] 0.86598 2.73250 95% [2,] 0.94780 2.49578 90% $LRInts: [1,] 0.86582 2.80218 95% [2,] 0.94883 2.53688 90% > lapply(ScorProfW(tabnums,alpha=c(.05,.1), thetarange=c(.5,4), npts=30, type="OR")[2:4], function(lelt) round(lelt,5)) $WaldInts: [1,] 0.85861 2.75267 95% "OR" = odds ratio [2,] 0.94291 2.50658 90% $ScorInts: [1,] 0.86182 2.81890 95% [2,] 0.94608 2.56927 90% $LRInts: [1,] 0.86159 2.89626 95% [2,] 0.94718 2.61491 90% TRY THESE FUNCTIONS OUT IN OTHER EXAMPLES ! Thus, for a 2x2 table with entries 60 40 55 45 find 95% intervals: > round(unlist(ScorProfW(matrix(c(60,55,40,45), ncol=2),alpha=.05, thetarange=c(-.1,.2), npts=20, type="D")[2:4]),4) WaldInts1 WaldInts2 ScorInts1 ScorInts2 LRInts1 LRInts2 for "Diff" -0.0868 0.1868 -0.0866 0.1848 -0.0868 0.1856 > round(unlist(ScorProfW(matrix(c(60,55,40,45), ncol=2),alpha=.05, thetarange=c(.5,1.5), npts=20, type="RR")[2:4]),4) WaldInts1 WaldInts2 ScorInts1 ScorInts2 LRInts1 LRInts2 0.7 2.1518 0.8587 1.3909 0.857 1.3923 > round(unlist(ScorProfW(matrix(c(60,55,40,45), ncol=2),alpha=.05, thetarange=c(.5,1.5), npts=20, type="OR")[2:4]),4) WaldInts1 WaldInts2 ScorInts1 ScorInts2 LRInts1 LRInts2 0.8591 1.3852 0.7059 2.0258 0.7003 2.5777 Only in large-sample settings are the Wald-Intervals to be trusted! ======================= Listings follow ========================== > NuisBinD function(tabdat, thet, tol1 = 0.1, tol2 = 0.001) { ### Coding of restricted ML estimation, along with score, Wald, ### and profile-likelihood confidence interval for parameter ### theta which is risk-difference if type=suffix=D, risk-ratio ### if type=RR, and odds-ratio if type=OR. ### 2x2 table entries are in tabdat. ### In present set-up, we assume that sampling for types 1 or 2 is ### row-binomial , and for type 3 is multinomial. mle0 <- sum(tabdat[, 1])/sum(tabdat) optlst <- nlminb(mle0, function(x, tabl, theta) - (tabl[1, 1] * log(x) + tabl[1, 2] * log(1 - x) + tabl[2, 1] * log(x - theta) + tabl[2, 2] * log(1 + theta - x)), lower = pmax(mle0 - tol1, thet) + tol2, upper = pmin(mle0 + tol1, 1) - tol2, tabl = tabdat, theta = thet) rmle <- optlst$param Scorstat <- (tabdat[1, 1]/sum(tabdat[1, ]) - tabdat[2, 1]/sum(tabdat[2, ]) - thet)/sqrt((rmle * (1 - rmle))/sum(tabdat[1, ]) + ((rmle - thet) * (1 + thet - rmle))/sum(tabdat[2, ])) mlepr <- c(tabdat[1, 1]/sum(tabdat[1, ]), tabdat[2, 1]/sum(tabdat[2, ])) loglik <- sum(tabdat[, 1] * log(mlepr)) + sum(tabdat[, 2] * log(1 - mlepr)) LRstat <- 2 * (loglik + optlst$obj) c(rmle = rmle, Scorstat = Scorstat, LRstat = LRstat, logLik = loglik) } > NuisBinRR function(tabdat, thet, tol1 = 0.1, tol2 = 0.001) { ### Coding of restricted ML estimation, along with score, Wald, ### and profile-likelihood confidence interval for parameter ### theta which is risk-difference if type=suffix=D, risk-ratio ### if type=RR, and odds-ratio if type=OR. ### 2x2 table entries are in tabdat. ### In present set-up, we assume that sampling for types 1 or 2 is ### row-binomial , and for type 3 is multinomial. mle0 <- sum(tabdat[, 1])/sum(tabdat) optlst <- nlminb(mle0, function(x, tabl, theta) - (tabl[1, 1] * log(x) + tabl[1, 2] * log(1 - x) + tabl[2, 1] * log(x/theta) + tabl[2, 2] * log(1 - x/theta)), lower = pmax( mle0 - tol1, 0) + tol2, upper = pmin(mle0 + tol1, thet) - tol2, tabl = tabdat, theta = thet) rmle <- optlst$param Scorstat <- (tabdat[1, 1]/sum(tabdat[1, ]) - thet * (tabdat[2, 1]/sum(tabdat[2, ])))/sqrt(rmle * ((1 - rmle)/sum(tabdat[1, ]) + (thet - rmle)/sum(tabdat[2, ]))) mlepr <- c(tabdat[1, 1]/sum(tabdat[1, ]), tabdat[2, 1]/sum(tabdat[2, ])) loglik <- sum(tabdat[, 1] * log(mlepr)) + sum(tabdat[, 2] * log(1 - mlepr)) LRstat <- 2 * (loglik + optlst$obj) c(rmle = rmle, Scorstat = Scorstat, LRstat = LRstat, logLik = loglik) } > NuisBinOR function(tabdat, thet, tol2 = 0.001) { ### Coding of restricted ML estimation, along with score, Wald, ### and profile-likelihood confidence interval for parameter ### theta which is risk-difference if type=suffix=D, risk-ratio ### if type=RR, and odds-ratio if type=OR. ### 2x2 table entries are in tabdat. ### In present set-up, we assume that sampling for types 1 or 2 is ### row-binomial , and for type 3 is multinomial. mle0 <- sum(tabdat[, 1])/sum(tabdat) optlst <- nlminb(mle0, function(x, tabl, theta) - (tabl[1, 1] * log(x) + tabl[1, 2] * log(1 - x) + tabl[2, 1] * log(x/(theta * (1 - x) + x)) + tabl[2, 2] * log((theta * (1 - x))/(theta * (1 - x) + x))), lower = tol2, upper = 1 - tol2, tabl = tabdat, theta = thet) rmle <- optlst$param rmle2 <- rmle/(rmle + thet * (1 - rmle)) mlepr <- c(tabdat[1, 1]/sum(tabdat[1, ]), tabdat[2, 1]/ sum(tabdat[2, ])) Scorstat <- (tabdat[1, 1] * tabdat[2, 2] - thet * tabdat[1, 2] * tabdat[2, 1])/(sum(tabdat[1, ]) * sum(tabdat[2, ]) * sqrt((rmle * (1 - rmle) * (thet * rmle2 + 1 - rmle2)^2)/ sum(tabdat[1, ]) + (rmle2 * (1 - rmle2) * (rmle + thet * (1 - rmle))^2)/sum(tabdat[2, ]))) loglik <- sum(tabdat[, 1] * log(mlepr)) + sum(tabdat[, 2] * log(1 - mlepr)) LRstat <- 2 * (loglik + optlst$obj) c(rmle = rmle, Scorstat = Scorstat, LRstat = LRstat, logLik = loglik) } > ScorProfW function(tabdat, alpha, thetarange, type = "D", npts = 50, tol = 0.001) { ## function to call appropriate function NuisBinxx ## to find confidence intervals using Wald, Score, and LR ## (profile) methods. Note that we need an input interval ## containing a suitable range of theta values to use in ## interpolating statistics as a function of theta. ## NB. alpha may be a vector num <- npts + 1 zpts <- qnorm(1 - alpha/2) chisqpts <- qchisq(1 - alpha, 1) xt <- seq(min(thetarange), max(thetarange), length = num) SLRmat <- array(0, dim = c(num, 2)) Statfcn <- get(paste("NuisBin", type, sep = "")) for(i in 1:num) SLRmat[i, ] <- Statfcn(tabdat, xt[i])[2:3] WaldInts <- if(type == "D") (tabdat[1, 1]/sum(tabdat[1, ])) - (tabdat[ 2, 1]/sum(tabdat[2, ])) + outer(zpts, c(-1, 1)) * sqrt( (tabdat[1, 1] * tabdat[1, 2])/sum(tabdat[1, ])^3 + ( tabdat[2, 1] * tabdat[2, 2])/sum(tabdat[2, ])^3) else { if(type == "OR") ((tabdat[1, 1] * sum(tabdat[2, ]))/(tabdat[2, 1] * sum( tabdat[1, ]))) * exp(outer(zpts, c(-1, 1)) * sqrt(tabdat[1, 2]/(sum(tabdat[1, ]) * tabdat[1, 1]) + tabdat[2, 2]/(sum(tabdat[2, ]) * tabdat[ 2, 1]))) else ((tabdat[1, 1] * tabdat[2, 2])/(tabdat[2, 1] * tabdat[1, 2 ])) * exp(outer(zpts, c(-1, 1)) * sqrt(sum(1/ tabdat))) } LRints <- replace(WaldInts, T, 0) tmpspl1 <- smooth.spline(SLRmat[, 1], xt, spar = 1e-06, all.knots = T) minpt <- (1:num)[SLRmat[, 2] == min(SLRmat[, 2])] tmpspl2A <- smooth.spline(SLRmat[1:minpt, 2], xt[1:minpt], spar = 1e-06, all.knots = T) tmpspl2B <- smooth.spline(SLRmat[minpt:num, 2], xt[minpt:num], spar = 1e-06, all.knots = T) ScorInts <- cbind(predict.smooth.spline(tmpspl1, zpts)$y, predict.smooth.spline(tmpspl1, - zpts)$y) LRInts <- cbind(predict.smooth.spline(tmpspl2A, chisqpts)$y, predict.smooth.spline(tmpspl2B, chisqpts)$y) list(minpt = minpt, WaldInts = WaldInts, ScorInts = ScorInts, LRInts = LRInts, thetavec = xt, Scorvec = SLRmat[, 1], LRvec = SLRmat[, 2]) }