Log Related to Maximization, Root-finding, & Vectorization ================================================== 2/13/08 We illustrate some calculations related to maximization, root-finding, and splines in an example on test-based confidence intervals in a small contingency table. > tabnums [,1] [,2] [1,] 28 656 [2,] 18 658 These data actually came from a Swedish study of aspirin use (the first row) and Myocardial Infarction (the first column) referenced in the Agresti categorical data analysis book. Here model the data as a Binomial in row i=1,2 with respective sample sizes n_1=684, n_2=675 and success probabilities p_1, p_2, with p_1 - p_2 = theta, and we want a confidence interval for it. For this, we must first get the `restricted MLE' for p_1 and p_2 subject to a fixed value of theta. Here is a little function to do this, and then calculate a Score statistic and (generalized) Likelihood Ratio statistic > NuisBinD function(tabdat, thet, tol1 = 0.1, tol2 = 0.001) { ### Coding of restricted ML estimation, along with score, Wald, ### and profile-likelihood confidence interval for parameter ### theta which is risk-difference. ### 2x2 table entries are in tabdat. mle0 = sum(tabdat[, 1])/sum(tabdat) ## ML for the case theta=0 used as starting value optlst = nlminb(mle0, function(x, tabl, theta) - (tabl[1, 1] * log(x) + tabl[1, 2] * log(1 - x) + tabl[2, 1] * log(x - theta) + tabl[2, 2] * log(1 + theta - x)), lower = pmax(mle0 - tol1, thet) + tol2, upper = pmin(mle0 + tol1, 1) - tol2, tabl = tabdat, theta = thet) ### up to this point, we have the maximum of the likelihood function ## with x substituted for p_1 and x-theta for p_2 rmle = optlst$par ## score statistic is the partial of the logLik with respect to x ## with the fixed value of theta substituted as though known Scorstat = (tabdat[1, 1]/sum(tabdat[1, ]) - tabdat[2, 1]/sum(tabdat[2, ]) - thet)/sqrt((rmle * (1 - rmle))/sum(tabdat[1, ]) + ((rmle - thet) * (1 + thet - rmle))/sum(tabdat[2, ])) ## next comes unconstrained MLE for p_1 and p_2 mlepr = c(tabdat[1,1]/sum(tabdat[1,]), tabdat[2,1]/sum(tabdat[2,])) loglik = sum(tabdat[, 1] * log(mlepr)) + sum(tabdat[, 2] * log(1 - mlepr)) LRstat = 2 * (loglik + optlst$obj) ## This is the Likelihood Ratio statistic for testing whether ## the fixed value of theta is correct c(rmle = rmle, Scorstat = Scorstat, LRstat = LRstat, logLik = loglik) } > round(NuisBinD(tabnums,0.01),5) rmle Scorstat LRstat logLik 0.03848 0.44197 0.19497 -199.923 > round(NuisBinD(tabnums,0.02),5) rmle Scorstat LRstat logLik 0.04456 -0.57585 0.33415 -199.923 > round(NuisBinD(tabnums,0.005),5) rmle Scorstat LRstat logLik 0.03597 0.95453 0.91173 -199.923 ### Now, to get CI's: > for(i in 0:25) cat(round(NuisBinD(tabnums, -.02 + i*.003),4), "for p1=",-.02 + i*.003,"\n") 0.0074 4.8426 12.9507 -199.923 for p1= -0.02 0.0089 4.4166 10.8494 -199.923 for p1= -0.017 0.0104 3.9917 8.8868 -199.923 for p1= -0.014 0.0119 3.5676 7.0793 -199.923 for p1= -0.011 0.0134 3.1442 5.4464 -199.923 for p1= -0.008 0.0149 2.7213 4.0121 -199.923 for p1= -0.005 0.0164 2.2987 2.8062 -199.923 for p1= -0.002 0.0342 1.3594 1.8596 -199.923 for p1= 0.001 0.0355 1.0564 1.1179 -199.923 for p1= 0.004 0.0369 0.75 0.562 -199.923 for p1= 0.007 0.0385 0.442 0.195 -199.923 for p1= 0.01 0.0402 0.1339 0.0179 -199.923 for p1= 0.013 0.042 -0.1725 0.0298 -199.923 for p1= 0.016 0.0439 -0.4757 0.2276 -199.923 for p1= 0.019 0.0459 -0.7745 0.6068 -199.923 for p1= 0.022 0.0481 -1.0679 1.1613 -199.923 for p1= 0.025 0.0503 -1.355 1.8842 -199.923 for p1= 0.028 0.0526 -1.6355 2.768 -199.923 for p1= 0.031 0.055 -1.909 3.8046 -199.923 for p1= 0.034 0.0574 -2.1755 4.9863 -199.923 for p1= 0.037 0.06 -2.435 6.305 -199.923 for p1= 0.04 0.0625 -2.6877 7.7533 -199.923 for p1= 0.043 0.0651 -2.9337 9.3239 -199.923 for p1= 0.046 0.0677 -3.1734 11.01 -199.923 for p1= 0.049 0.0704 -3.4069 12.8052 -199.923 for p1= 0.052 0.0731 -3.6347 14.7035 -199.923 for p1= 0.055 So the Score and LR CI's are approximately (-.002, .034) and (-.005, .034) > (tabnums[1,1]/sum(tabnums[1,]))-(tabnums[2,1]/ sum(tabnums[2,]))+qnorm(.975)*sqrt(tabnums[1,1]* tabnums[1,2]/sum(tabnums[1,])^3 + tabnums[2,1]* tabnums[2,2]/sum(tabnums[2,])^3)*c(-1,1) [1] -0.004868983 0.033485890 ### OK, this is the Wald CI !!! --------------------------------------------------------------- If we want an exact answer for interval boundaries, then we must invert the Score and LR components regarded as functions of the parameter theta . > uniroot(function(diff,tabl) NuisBinD(tabl,diff)[2] - 1.96, lower=-.1, upper=.1, tabl=tabnums)$root ### -0.001048 > uniroot(function(diff,tabl) NuisBinD(tabl,diff)[2] + 1.96, lower=-.1, upper=.1, tabl=tabnums)$root ### 0.034570 ------------------------------------------------------------- We could also do this `graphically' : > xt = -.05 + (0:100)/1000 Svec = LRvec <- numeric(101) > for(i in 1:101) { aux = NuisBinD(tabnums,xt[i]) Svec[i] = aux[2] LRvec[i] = aux[3] } > plot(xt, Svec, type="l"); abline(h=1.96, lty=5) abline(h=-1.96, lty=9) > plot(xt, LRvec, type="l") ------------------------------------------------------------- We proceed to give a slightly sophisticated (spline-smoothed) version of "table look-up" to invert the CI. > tmpspl = smooth.spline(xt,Svec, spar = 1.e-6, all.knots=T) xt2 = runif(100, -.04, .04) > plot(xt2, predict(tmpspl,xt2)$y) > tmpspl2 = smooth.spline(Svec, xt, spar = 1.e-6, all.knots=T) predict(tmpspl2,1.96*(c(1,-1))) $x [1] 1.96 -1.96 $y [1] -0.0006013194 0.0345674987