Script for LogLinear Models, Common Odds Ratio, Mantel Haenszel Statistic etc.
=============================================================================
Eric Slud, STAT 770                                              10/31/2018

Multiple-Center Clinical Trial Dataset Table 6.9

## First create data-frame as flexible data structure

Tab6.9 = cbind.data.frame(Center=factor(rep(1:8,rep(2,8))), Trt = 
   rep(c("Drug","Control"),8), succ=c(11,10,16,22,14,7,2,1,6,0,1,0,1,1,4,6),
   fail=c(25,27,4,10,5,12,14,16,11,12,10,10,4,8,2,1))
   Center     Trt succ fail
1       1    Drug   11   25
2       1 Control   10   27
3       2    Drug   16    4
4       2 Control   22   10
5       3    Drug   14    5
6       3 Control    7   12
7       4    Drug    2   14
8       4 Control    1   16
9       5    Drug    6   11
10      5 Control    0   12
11      6    Drug    1   10
12      6 Control    0   10
13      7    Drug    1    4
14      7 Control    1    8
15      8    Drug    4    2
16      8 Control    6    1

> tabl6.9 = array(c(Tab6.9$succ,Tab6.9$fail), c(2,8,2), 
   dimnames=list(c("Drug","Contr"),paste0("Ctr",1:8),c("succ","fail")))

> tabl6.9
, , succ
      Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8
Drug    11   16   14    2    6    1    1    4
Contr   10   22    7    1    0    0    1    6

, , fail
      Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8
Drug    25    4    5   14   11   10    4    2
Contr   27   10   12   16   12   10    8    1


## First want to assess whether success and failure rates depend on Ctr, Trt

> fit0 = glm(cbind(succ,fail) ~ Trt*Center , family=binomial, data=Tab6.9)
  fit1 = glm(cbind(succ,fail) ~ Center+Trt , family=binomial, data=Tab6.9)
## Note that because "Contr" is alphabetically before "Drug", it becomes the reference
##   category for the factor "Trt"

> fit1
Call:  glm(formula = cbind(succ, fail) ~ Center + Trt, family = binomial, 
    data = Tab6.9)

Coefficients:
(Intercept)      Center2      Center3      Center4      Center5      Center6  
    -1.3220       2.0554       1.1529      -1.4185      -0.5199      -2.1469  
    Center7      Center8      TrtDrug  
    -0.7977       2.2079       0.7769  

Degrees of Freedom: 15 Total (i.e. Null);  7 Residual
Null Deviance:      93.55 
Residual Deviance: 9.746        AIC: 66.14

> anova(fit1)
Analysis of Deviance Table

Model: binomial, link: logit
Response: cbind(succ, fail)
Terms added sequentially (first to last)
       Df Deviance Resid. Df Resid. Dev
NULL                      15     93.555
Center  7   77.139         8     16.415
Trt     1    6.669         7      9.746  ### shows significance of both Center and Trt
                                         
> c(1-pchisq(77.14,7), 1-pchisq(6.669,1))
[1] 5.262457e-14 9.810422e-03      ### big center differences

### Residual deviance of 9.7 on 7 df is adequate, p-value = 1-pchisq(9.7,7) = 0.21

### log odds ratio of success = fit0 or fit1  $linear.pred
### To find  Odds Ratio for Drug vs Contr, by Center

> round(apply(tabl6.9,2, function(twb2) twb2[1]*twb2[4]/(twb2[2]*twb2[3])),3)
 Ctr1  Ctr2  Ctr3  Ctr4  Ctr5  Ctr6  Ctr7  Ctr8 
1.188 1.818 4.800 2.286   Inf   Inf 2.000 0.333     ### as in Table 6.9

### Saturated table log odds ratio:
> exp(fit0$lin[seq(1,15,by=2)]-fit0$lin[seq(2,16,by=2)])
           1            3            5            7            9           11 
1.188000e+00 1.818182e+00 4.800000e+00 2.285714e+00 5.202411e+10 8.105392e+09 
          13           15 
2.000000e+00 3.333333e-01 

### We can see here that the fitted odds ratios are very different.
The fit1 (unsaturated model) says log odds of success 
      = Int + beta*I[Trt="Drug"] + gamma_{Ctr}
which says odds ratio by center = beta  does not vary with Ctr=k !!!

The adequacy of common odds ratio holds despite the difference among Ctr coeffs  
seen below:

> summary(fit1)$coef
              Estimate Std. Error    z value     Pr(>|z|)
(Intercept) -1.3220147  0.3164608 -4.1774991 2.947318e-05
Center2      2.0554344  0.4200889  4.8928561 9.938305e-07
Center3      1.1529027  0.4245669  2.7154798 6.617982e-03
Center4     -1.4184537  0.6635815 -2.1375726 3.255145e-02
Center5     -0.5198903  0.5337854 -0.9739687 3.300721e-01
Center6     -2.1469201  1.0613677 -2.0227863 4.309519e-02
Center7     -0.7977076  0.8149166 -0.9788825 3.276380e-01
Center8      2.2079143  0.7195059  3.0686534 2.150259e-03
TrtDrug      0.7769203  0.3066870  2.5332680 1.130045e-02

### For a formal test of Center differences, look at deviance difference between 
          fit1 and a model without "Center"

> anova( update(fit1, formula=.~Trt+Center) )  
       Df Deviance Resid. Df Resid. Dev
NULL                      15     93.555
Trt     1    2.594        14     90.960
Center  7   81.214         7      9.746   ### 1-pchisq(81.2,7) = 8e-15

##---------------------------------------------------------------------
### To look at standardized logistic-regression residuals:

> dimnames(model.matrix(fit1))[[2]]
[1] "(Intercept)" "Center2"     "Center3"     "Center4"     "Center5"    
[6] "Center6"     "Center7"     "Center8"     "TrtDrug" 

> M = model.matrix(fit1)
  sgsq = dlogis(fit1$lin)
  tmp = array(0,c(9,9))
  for(i in 1:16) tmp = tmp+sgsq[i]*(M[i,] %o% M[i,])
  hat  = 1 - sgsq*diag(M %*% solve(tmp) %*% t(M))
  ntri = Tab6.9$succ+Tab6.9$fail
> array((Tab6.9$succ-fit1$fit*ntri)/sqrt(ntri*sgsq*(1-hat)), c(2,8), dimnames(tabl6.9)[1:2])
            Ctr1       Ctr2       Ctr3        Ctr4      Ctr5       Ctr6        Ctr7
Drug  -0.9464911 -0.3156109  1.0649234  0.02835894  1.117662  0.4527830 -0.04751390
Contr  1.2429872  0.1778463 -0.9934539 -0.05004252 -2.031881 -0.9393181  0.05761151
           Ctr8
Drug  -1.682083
Contr  1.066151          #### look at Contr, Ctr5 and Drug, Ctr8 cells !


###----------------------------------------------------------------------
### Let's look at this more closely by fitting a loglinear model:

> fit2 = loglin(tabl6.9, margin=list(c(1,2),c(1,3),c(2,3)), param=TRUE,
+       fit=TRUE)
6 iterations: deviation 0.05524648 

> unlist(fit2[1:3])
     lrt  pearson       df 
9.746413 8.020305 7.000000   ### NOTE: lrt for adequacy of  loglinear model with 
       ### no 3-way interaction is the same as the residual deviance  test for fit1

> fit2$fit
, , succ

           Ctr1     Ctr2      Ctr3     Ctr4     Ctr5      Ctr6      Ctr7
Drug  13.203691 16.39167 12.300536 1.967488 4.355376 0.6972879 1.0340759
Contr  7.796309 21.60833  8.699464 1.032512 1.644624 0.3027121 0.9659241
          Ctr8
Drug  5.046457
Contr 4.953543

, , fail

         Ctr1      Ctr2      Ctr3     Ctr4     Ctr5      Ctr6     Ctr7
Drug  22.7898  3.625506  6.705549 14.02421 12.63962 10.296738 3.963482
Contr 29.2102 10.374494 10.294451 15.97579 10.36038  9.703262 8.036518
           Ctr8
Drug  0.9582409
Contr 2.0417591

> tabl6.9        ### Comparison of these fitted values with observed looks good!
, , succ

      Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8
Drug    11   16   14    2    6    1    1    4
Contr   10   22    7    1    0    0    1    6

, , fail

      Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8
Drug    25    4    5   14   11   10    4    2
Contr   27   10   12   16   12   10    8    1

### Chi-square test statistic is

> sum(c(fit2$fit-tabl6.9)^2/c(fit2$fit))
[1] 8.020305              ### agrees with "pearson" 7df statistic above

##----------------------------------------------------------------
#### We saw above that the common odds ratio should be 
> exp(fit1$coef[9]) 
 TrtDrug 
2.174764 

## compare this with 4 times the fit2$param$`1.3` parameter
> 4*fit2$param$`1.3`
           succ      fail
Drug   0.775051 -0.775051          ### upper-left is the desired value
Contr -0.775051  0.775051

### Let's compare this with the estimate of common odds ratio obtained 
from Mantel Haenszel statistic 

sum(tabl6.9[1,,1]*tabl6.9[2,,2]/apply(tabl6.9,2,sum))/
sum(tabl6.9[1,,2]*tabl6.9[2,,1]/apply(tabl6.9,2,sum))
[1] 2.134549

### How would we obtain standard error for this estimate ?

## Method 1: delta method with SE of TrtDrug coeff
> exp(fit1$coef[9])* sqrt(summary(fit1)$cov.scaled[9,9])
  TrtDrug 
0.6669719 

## Confidence interval for common OR: 
> exp(fit1$coef[9] + 1.96*c(-1,1)*sqrt(summary(fit1)$cov.scaled[9,9]))
[1] 1.192216 3.967066            #### Very broad CI !!!


## Method 2: alternate form of loglin model in library MASS
> Tab6.9B = cbind.data.frame(Center=rep(Tab6.9$Center,rep(2,16)),
           Trt=rep(Tab6.9$Trt,rep(2,16)), Succ=rep(c(T,F),16),
           Freq = c(t(data.matrix(Tab6.9[,3:4]))))

> fit2B = loglm(Freq ~Center*Trt+Center*Succ+Trt*Succ, data=Tab6.9B)
> fit2B[1:3]
$lrt
[1] 9.746366

$pearson
[1] 8.021007

$df 
[1] 7                 ### Same output as before but still no SEs !!?

### To get standard errors: must do the coding for contrasts differently, 
 as described in  http://www.uni-kiel.de/psychologie/rexrepos/posts/logLinear.html

> fit2B = glm(Freq ~Center*Trt+Center*Succ+Trt*Succ, , family=poisson,
              contrasts=list(Trt=contr.sum, Center=contr.sum, Succ=contr.sum), 
              data=Tab6.9B)
 
> fit2B$coef        ### length 25
  (Intercept)       Center1       Center2       Center3       Center4       Center5 
  1.653006737   1.130622235   0.720909783   0.573757225  -0.123053057   0.057663457 
      Center6       Center7          Trt1         Succ1  Center1:Trt1  Center2:Trt1 
 -0.891115491  -0.787997254  -0.002446206   0.433447299  -0.067606496   0.335335406 
 Center3:Trt1  Center4:Trt1  Center5:Trt1  Center6:Trt1  Center7:Trt1 Center1:Succ1 
  0.023403436  -0.127071202  -0.291409006  -0.222002231   0.161413455   0.033329992 
Center2:Succ1 Center3:Succ1 Center4:Succ1 Center5:Succ1 Center6:Succ1 Center7:Succ1 
 -0.994387227  -0.543121377   0.742556818   0.293275145   1.106790042   0.432183789 
   Trt1:Succ1 
  0.194230070 

summary(fit2B)$coef[25,2]
> summary(fit2B)$coef[25,2]
[1] 0.07667188
> .Last.value*4
[1] 0.3066875

> tmp = factor(1:10)
> contrasts(tmp)
   2 3 4 5 6 7 8 9 10
1  0 0 0 0 0 0 0 0  0
2  1 0 0 0 0 0 0 0  0
3  0 1 0 0 0 0 0 0  0
4  0 0 1 0 0 0 0 0  0
5  0 0 0 1 0 0 0 0  0
6  0 0 0 0 1 0 0 0  0
7  0 0 0 0 0 1 0 0  0
8  0 0 0 0 0 0 1 0  0
9  0 0 0 0 0 0 0 1  0
10 0 0 0 0 0 0 0 0  1
> contrasts(tmp)="contr.sum"
> contrasts(tmp)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
1     1    0    0    0    0    0    0    0    0
2     0    1    0    0    0    0    0    0    0
3     0    0    1    0    0    0    0    0    0
4     0    0    0    1    0    0    0    0    0
5     0    0    0    0    1    0    0    0    0
6     0    0    0    0    0    1    0    0    0
7     0    0    0    0    0    0    1    0    0
8     0    0    0    0    0    0    0    1    0
9     0    0    0    0    0    0    0    0    1
10   -1   -1   -1   -1   -1   -1   -1   -1   -1


## Method 3: see bottom of p.229 !!

##----------------------------------------------------------------