Log Related to Maximization, Root-finding, & Vectorization
================================================== 2/13/08 

We illustrate some calculations related to maximization, root-finding,
and splines in an example on test-based confidence intervals in 
a small contingency table.

> tabnums
     [,1] [,2] 
[1,]   28  656
[2,]   18  658

These data actually came from a Swedish study of aspirin use 
(the first row) and Myocardial Infarction (the first column) 
referenced in the Agresti categorical data analysis book. Here
model the data as a Binomial in row i=1,2 with respective sample 
sizes n_1=684, n_2=675 and success probabilities p_1, p_2, with
            p_1 - p_2 = theta, 
and we want a confidence interval for it. For this, we must first 
get the `restricted MLE' for p_1 and p_2  subject to a fixed value 
of theta. Here is a little function to do this, and then
calculate a Score statistic and (generalized) Likelihood Ratio
statistic

> NuisBinD
function(tabdat, thet, tol1 = 0.1, tol2 = 0.001)
{
### Coding of restricted ML estimation, along with score, Wald, 
###   and profile-likelihood confidence interval for parameter 
###   theta which is risk-difference. 
### 2x2 table entries are in  tabdat.
   mle0 = sum(tabdat[, 1])/sum(tabdat) 
  ## ML for the case theta=0 used as starting value
   optlst = nlminb(mle0, function(x, tabl, theta)
     - (tabl[1, 1] * log(x) + tabl[1, 2] * log(1 - x) + tabl[2, 1] * 
      log(x - theta) + tabl[2, 2] * log(1 + theta - x)), 
      lower = pmax(mle0 - tol1, thet) + tol2, upper = pmin(mle0 + 
      tol1, 1) - tol2, tabl = tabdat, theta = thet)
  ### up to this point, we have the maximum of the likelihood function
  ## with x substituted for p_1 and x-theta for p_2
   rmle = optlst$par
  ## score statistic is the partial of the logLik with respect to x
  ## with the fixed value of theta substituted as though known
   Scorstat = (tabdat[1, 1]/sum(tabdat[1,  ]) - tabdat[2, 1]/sum(tabdat[2,
      ]) - thet)/sqrt((rmle * (1 - rmle))/sum(tabdat[1,  ]) + ((rmle - 
      thet) * (1 + thet - rmle))/sum(tabdat[2,  ]))
   ## next comes unconstrained MLE for p_1 and p_2
   mlepr = c(tabdat[1,1]/sum(tabdat[1,]), tabdat[2,1]/sum(tabdat[2,]))
   loglik = sum(tabdat[, 1] * log(mlepr)) + sum(tabdat[, 2] * log(1 - 
      mlepr))
   LRstat = 2 * (loglik + optlst$obj)
   ## This is the Likelihood Ratio statistic for testing whether 
   ##   the fixed value of theta is correct
   c(rmle = rmle, Scorstat = Scorstat, LRstat = LRstat, logLik = loglik)
}


> round(NuisBinD(tabnums,0.01),5)
    rmle Scorstat  LRstat   logLik 
 0.03848  0.44197 0.19497 -199.923
> round(NuisBinD(tabnums,0.02),5)
    rmle Scorstat  LRstat   logLik 
 0.04456 -0.57585 0.33415 -199.923
> round(NuisBinD(tabnums,0.005),5)
    rmle Scorstat  LRstat   logLik 
 0.03597  0.95453 0.91173 -199.923

### Now, to get CI's:

> for(i in 0:25) cat(round(NuisBinD(tabnums, -.02 + i*.003),4),
         "for p1=",-.02 + i*.003,"\n")
0.0074 4.8426 12.9507 -199.923 for p1= -0.02
0.0089 4.4166 10.8494 -199.923 for p1= -0.017
0.0104 3.9917 8.8868 -199.923 for p1= -0.014
0.0119 3.5676 7.0793 -199.923 for p1= -0.011
0.0134 3.1442 5.4464 -199.923 for p1= -0.008
0.0149 2.7213 4.0121 -199.923 for p1= -0.005
0.0164 2.2987 2.8062 -199.923 for p1= -0.002
0.0342 1.3594 1.8596 -199.923 for p1= 0.001
0.0355 1.0564 1.1179 -199.923 for p1= 0.004
0.0369 0.75 0.562 -199.923 for p1= 0.007
0.0385 0.442 0.195 -199.923 for p1= 0.01
0.0402 0.1339 0.0179 -199.923 for p1= 0.013
0.042 -0.1725 0.0298 -199.923 for p1= 0.016
0.0439 -0.4757 0.2276 -199.923 for p1= 0.019
0.0459 -0.7745 0.6068 -199.923 for p1= 0.022
0.0481 -1.0679 1.1613 -199.923 for p1= 0.025
0.0503 -1.355 1.8842 -199.923 for p1= 0.028
0.0526 -1.6355 2.768 -199.923 for p1= 0.031
0.055 -1.909 3.8046 -199.923 for p1= 0.034
0.0574 -2.1755 4.9863 -199.923 for p1= 0.037
0.06 -2.435 6.305 -199.923 for p1= 0.04
0.0625 -2.6877 7.7533 -199.923 for p1= 0.043
0.0651 -2.9337 9.3239 -199.923 for p1= 0.046
0.0677 -3.1734 11.01 -199.923 for p1= 0.049
0.0704 -3.4069 12.8052 -199.923 for p1= 0.052
0.0731 -3.6347 14.7035 -199.923 for p1= 0.055


So the Score and LR CI's are approximately 
(-.002, .034)  and  (-.005, .034)

> (tabnums[1,1]/sum(tabnums[1,]))-(tabnums[2,1]/
sum(tabnums[2,]))+qnorm(.975)*sqrt(tabnums[1,1]*
tabnums[1,2]/sum(tabnums[1,])^3 + tabnums[2,1]*
tabnums[2,2]/sum(tabnums[2,])^3)*c(-1,1)
[1] -0.004868983  0.033485890   
       ### OK, this is the Wald CI !!!


---------------------------------------------------------------

If we want an exact answer for interval boundaries, then we must
invert the Score and LR components regarded as functions of the 
parameter  theta  .


> uniroot(function(diff,tabl) NuisBinD(tabl,diff)[2] - 1.96,
      lower=-.1, upper=.1, tabl=tabnums)$root ### -0.001048
> uniroot(function(diff,tabl) NuisBinD(tabl,diff)[2] + 1.96,
      lower=-.1, upper=.1, tabl=tabnums)$root  ### 0.034570

-------------------------------------------------------------

We could also do this `graphically' :

> xt =  -.05 + (0:100)/1000
  Svec = LRvec <- numeric(101)
> for(i in 1:101) {
     aux = NuisBinD(tabnums,xt[i])
     Svec[i] = aux[2]
     LRvec[i] = aux[3] }
> plot(xt, Svec, type="l"); abline(h=1.96, lty=5)
  abline(h=-1.96, lty=9)
> plot(xt, LRvec, type="l")

-------------------------------------------------------------

We proceed to give a slightly sophisticated (spline-smoothed) 
   version of "table look-up" to invert the CI.

> tmpspl = smooth.spline(xt,Svec, spar = 1.e-6, all.knots=T)
  xt2 = runif(100, -.04, .04)
> plot(xt2, predict(tmpspl,xt2)$y)
> tmpspl2 = smooth.spline(Svec, xt, spar = 1.e-6, all.knots=T)
  predict(tmpspl2,1.96*(c(1,-1)))
$x
[1]  1.96 -1.96

$y
[1] -0.0006013194  0.0345674987