Script for LogLinear Models, Common Odds Ratio, Mantel Haenszel Statistic etc. ============================================================================= Eric Slud, STAT 770 10/31/2018 Multiple-Center Clinical Trial Dataset Table 6.9 ## First create data-frame as flexible data structure Tab6.9 = cbind.data.frame(Center=factor(rep(1:8,rep(2,8))), Trt = rep(c("Drug","Control"),8), succ=c(11,10,16,22,14,7,2,1,6,0,1,0,1,1,4,6), fail=c(25,27,4,10,5,12,14,16,11,12,10,10,4,8,2,1)) Center Trt succ fail 1 1 Drug 11 25 2 1 Control 10 27 3 2 Drug 16 4 4 2 Control 22 10 5 3 Drug 14 5 6 3 Control 7 12 7 4 Drug 2 14 8 4 Control 1 16 9 5 Drug 6 11 10 5 Control 0 12 11 6 Drug 1 10 12 6 Control 0 10 13 7 Drug 1 4 14 7 Control 1 8 15 8 Drug 4 2 16 8 Control 6 1 > tabl6.9 = array(c(Tab6.9$succ,Tab6.9$fail), c(2,8,2), dimnames=list(c("Drug","Contr"),paste0("Ctr",1:8),c("succ","fail"))) > tabl6.9 , , succ Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8 Drug 11 16 14 2 6 1 1 4 Contr 10 22 7 1 0 0 1 6 , , fail Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8 Drug 25 4 5 14 11 10 4 2 Contr 27 10 12 16 12 10 8 1 ## First want to assess whether success and failure rates depend on Ctr, Trt > fit0 = glm(cbind(succ,fail) ~ Trt*Center , family=binomial, data=Tab6.9) fit1 = glm(cbind(succ,fail) ~ Center+Trt , family=binomial, data=Tab6.9) ## Note that because "Contr" is alphabetically before "Drug", it becomes the reference ## category for the factor "Trt" > fit1 Call: glm(formula = cbind(succ, fail) ~ Center + Trt, family = binomial, data = Tab6.9) Coefficients: (Intercept) Center2 Center3 Center4 Center5 Center6 -1.3220 2.0554 1.1529 -1.4185 -0.5199 -2.1469 Center7 Center8 TrtDrug -0.7977 2.2079 0.7769 Degrees of Freedom: 15 Total (i.e. Null); 7 Residual Null Deviance: 93.55 Residual Deviance: 9.746 AIC: 66.14 > anova(fit1) Analysis of Deviance Table Model: binomial, link: logit Response: cbind(succ, fail) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 15 93.555 Center 7 77.139 8 16.415 Trt 1 6.669 7 9.746 ### shows significance of both Center and Trt > c(1-pchisq(77.14,7), 1-pchisq(6.669,1)) [1] 5.262457e-14 9.810422e-03 ### big center differences ### Residual deviance of 9.7 on 7 df is adequate, p-value = 1-pchisq(9.7,7) = 0.21 ### log odds ratio of success = fit0 or fit1 $linear.pred ### To find Odds Ratio for Drug vs Contr, by Center > round(apply(tabl6.9,2, function(twb2) twb2[1]*twb2[4]/(twb2[2]*twb2[3])),3) Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8 1.188 1.818 4.800 2.286 Inf Inf 2.000 0.333 ### as in Table 6.9 ### Saturated table log odds ratio: > exp(fit0$lin[seq(1,15,by=2)]-fit0$lin[seq(2,16,by=2)]) 1 3 5 7 9 11 1.188000e+00 1.818182e+00 4.800000e+00 2.285714e+00 5.202411e+10 8.105392e+09 13 15 2.000000e+00 3.333333e-01 ### We can see here that the fitted odds ratios are very different. The fit1 (unsaturated model) says log odds of success = Int + beta*I[Trt="Drug"] + gamma_{Ctr} which says odds ratio by center = beta does not vary with Ctr=k !!! The adequacy of common odds ratio holds despite the difference among Ctr coeffs seen below: > summary(fit1)$coef Estimate Std. Error z value Pr(>|z|) (Intercept) -1.3220147 0.3164608 -4.1774991 2.947318e-05 Center2 2.0554344 0.4200889 4.8928561 9.938305e-07 Center3 1.1529027 0.4245669 2.7154798 6.617982e-03 Center4 -1.4184537 0.6635815 -2.1375726 3.255145e-02 Center5 -0.5198903 0.5337854 -0.9739687 3.300721e-01 Center6 -2.1469201 1.0613677 -2.0227863 4.309519e-02 Center7 -0.7977076 0.8149166 -0.9788825 3.276380e-01 Center8 2.2079143 0.7195059 3.0686534 2.150259e-03 TrtDrug 0.7769203 0.3066870 2.5332680 1.130045e-02 ### For a formal test of Center differences, look at deviance difference between fit1 and a model without "Center" > anova( update(fit1, formula=.~Trt+Center) ) Df Deviance Resid. Df Resid. Dev NULL 15 93.555 Trt 1 2.594 14 90.960 Center 7 81.214 7 9.746 ### 1-pchisq(81.2,7) = 8e-15 ##--------------------------------------------------------------------- ### To look at standardized logistic-regression residuals: > dimnames(model.matrix(fit1))[[2]] [1] "(Intercept)" "Center2" "Center3" "Center4" "Center5" [6] "Center6" "Center7" "Center8" "TrtDrug" > M = model.matrix(fit1) sgsq = dlogis(fit1$lin) tmp = array(0,c(9,9)) for(i in 1:16) tmp = tmp+sgsq[i]*(M[i,] %o% M[i,]) hat = 1 - sgsq*diag(M %*% solve(tmp) %*% t(M)) ntri = Tab6.9$succ+Tab6.9$fail > array((Tab6.9$succ-fit1$fit*ntri)/sqrt(ntri*sgsq*(1-hat)), c(2,8), dimnames(tabl6.9)[1:2]) Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Drug -0.9464911 -0.3156109 1.0649234 0.02835894 1.117662 0.4527830 -0.04751390 Contr 1.2429872 0.1778463 -0.9934539 -0.05004252 -2.031881 -0.9393181 0.05761151 Ctr8 Drug -1.682083 Contr 1.066151 #### look at Contr, Ctr5 and Drug, Ctr8 cells ! ###---------------------------------------------------------------------- ### Let's look at this more closely by fitting a loglinear model: > fit2 = loglin(tabl6.9, margin=list(c(1,2),c(1,3),c(2,3)), param=TRUE, + fit=TRUE) 6 iterations: deviation 0.05524648 > unlist(fit2[1:3]) lrt pearson df 9.746413 8.020305 7.000000 ### NOTE: lrt for adequacy of loglinear model with ### no 3-way interaction is the same as the residual deviance test for fit1 > fit2$fit , , succ Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Drug 13.203691 16.39167 12.300536 1.967488 4.355376 0.6972879 1.0340759 Contr 7.796309 21.60833 8.699464 1.032512 1.644624 0.3027121 0.9659241 Ctr8 Drug 5.046457 Contr 4.953543 , , fail Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Drug 22.7898 3.625506 6.705549 14.02421 12.63962 10.296738 3.963482 Contr 29.2102 10.374494 10.294451 15.97579 10.36038 9.703262 8.036518 Ctr8 Drug 0.9582409 Contr 2.0417591 > tabl6.9 ### Comparison of these fitted values with observed looks good! , , succ Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8 Drug 11 16 14 2 6 1 1 4 Contr 10 22 7 1 0 0 1 6 , , fail Ctr1 Ctr2 Ctr3 Ctr4 Ctr5 Ctr6 Ctr7 Ctr8 Drug 25 4 5 14 11 10 4 2 Contr 27 10 12 16 12 10 8 1 ### Chi-square test statistic is > sum(c(fit2$fit-tabl6.9)^2/c(fit2$fit)) [1] 8.020305 ### agrees with "pearson" 7df statistic above ##---------------------------------------------------------------- #### We saw above that the common odds ratio should be > exp(fit1$coef[9]) TrtDrug 2.174764 ## compare this with 4 times the fit2$param$`1.3` parameter > 4*fit2$param$`1.3` succ fail Drug 0.775051 -0.775051 ### upper-left is the desired value Contr -0.775051 0.775051 ### Let's compare this with the estimate of common odds ratio obtained from Mantel Haenszel statistic sum(tabl6.9[1,,1]*tabl6.9[2,,2]/apply(tabl6.9,2,sum))/ sum(tabl6.9[1,,2]*tabl6.9[2,,1]/apply(tabl6.9,2,sum)) [1] 2.134549 ### How would we obtain standard error for this estimate ? ## Method 1: delta method with SE of TrtDrug coeff > exp(fit1$coef[9])* sqrt(summary(fit1)$cov.scaled[9,9]) TrtDrug 0.6669719 ## Confidence interval for common OR: > exp(fit1$coef[9] + 1.96*c(-1,1)*sqrt(summary(fit1)$cov.scaled[9,9])) [1] 1.192216 3.967066 #### Very broad CI !!! ## Method 2: alternate form of loglin model in library MASS > Tab6.9B = cbind.data.frame(Center=rep(Tab6.9$Center,rep(2,16)), Trt=rep(Tab6.9$Trt,rep(2,16)), Succ=rep(c(T,F),16), Freq = c(t(data.matrix(Tab6.9[,3:4])))) > fit2B = loglm(Freq ~Center*Trt+Center*Succ+Trt*Succ, data=Tab6.9B) > fit2B[1:3] $lrt [1] 9.746366 $pearson [1] 8.021007 $df [1] 7 ### Same output as before but still no SEs !!? ### To get standard errors: must do the coding for contrasts differently, as described in http://www.uni-kiel.de/psychologie/rexrepos/posts/logLinear.html > fit2B = glm(Freq ~Center*Trt+Center*Succ+Trt*Succ, , family=poisson, contrasts=list(Trt=contr.sum, Center=contr.sum, Succ=contr.sum), data=Tab6.9B) > fit2B$coef ### length 25 (Intercept) Center1 Center2 Center3 Center4 Center5 1.653006737 1.130622235 0.720909783 0.573757225 -0.123053057 0.057663457 Center6 Center7 Trt1 Succ1 Center1:Trt1 Center2:Trt1 -0.891115491 -0.787997254 -0.002446206 0.433447299 -0.067606496 0.335335406 Center3:Trt1 Center4:Trt1 Center5:Trt1 Center6:Trt1 Center7:Trt1 Center1:Succ1 0.023403436 -0.127071202 -0.291409006 -0.222002231 0.161413455 0.033329992 Center2:Succ1 Center3:Succ1 Center4:Succ1 Center5:Succ1 Center6:Succ1 Center7:Succ1 -0.994387227 -0.543121377 0.742556818 0.293275145 1.106790042 0.432183789 Trt1:Succ1 0.194230070 summary(fit2B)$coef[25,2] > summary(fit2B)$coef[25,2] [1] 0.07667188 > .Last.value*4 [1] 0.3066875 > tmp = factor(1:10) > contrasts(tmp) 2 3 4 5 6 7 8 9 10 1 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 0 0 4 0 0 1 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 0 0 0 0 1 0 0 0 0 7 0 0 0 0 0 1 0 0 0 8 0 0 0 0 0 0 1 0 0 9 0 0 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 1 > contrasts(tmp)="contr.sum" > contrasts(tmp) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] 1 1 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 5 0 0 0 0 1 0 0 0 0 6 0 0 0 0 0 1 0 0 0 7 0 0 0 0 0 0 1 0 0 8 0 0 0 0 0 0 0 1 0 9 0 0 0 0 0 0 0 0 1 10 -1 -1 -1 -1 -1 -1 -1 -1 -1 ## Method 3: see bottom of p.229 !! ##----------------------------------------------------------------