HW22 Solution Log. ================== > names(infert) [1] "education" "age" "parity" "induced" [5] "case" "spontaneous" "stratum" "pooled.stratum" model1 = glm(case ~ spontaneous+induced, data=infert,family=binomial) > summary(model1)$coef Estimate Std. Error z value Pr(>|z|) (Intercept) -1.7078601 0.2677095 -6.379528 1.776344e-10 spontaneous 1.1972050 0.2116433 5.656712 1.543004e-08 induced 0.4181294 0.2056274 2.033432 4.200891e-02 > table(infert$indu,infert$spont) 0 1 2 0 67 47 29 1 45 16 7 2 29 8 0 > modelB = step(model1, scope= ~ + spontaneous + induced + parity + age + education, direction="for", k=3.84) Start: AIC= 291.13 case ~ spontaneous + induced Df Deviance AIC + parity 1 264.10 279.46 279.61 291.13 + age 1 279.04 294.40 + education 2 279.41 298.61 Step: AIC= 279.46 case ~ spontaneous + induced + parity Df Deviance AIC 264.10 279.46 + age 1 260.94 280.14 + education 2 259.43 282.47 > summary(modelB)$coef Estimate Std. Error z value Pr(>|z|) (Intercept) -1.1887411 0.3112065 -3.819782 1.335695e-04 spontaneous 1.8097810 0.2873694 6.297751 3.019945e-10 induced 1.0776691 0.2820025 3.821488 1.326491e-04 parity -0.6372162 0.1751415 -3.638294 2.744501e-04 ## OK, very persuasive > summary(modelB$fit) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04309 0.13870 0.32120 0.33470 0.49600 0.83160 ### With smaller k, can also add "age", but p-value of ### that coef is .08. > summary(update(modelB, formula = .~.+age)$fit) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.03282 0.15270 0.29000 0.33470 0.49260 0.86640 > summary(update(modelB, formula = .~.+age)$residuals) Min. 1st Qu. Median Mean 3rd Qu. Max. -3.764 -1.383 -1.151 0.181 1.456 30.470 ### Not too much difference. ## Now can plot residuals > plot(infert$age,modelB$resid) ### So not too important to include age ## Finally check dispersion: > summary(update(modelB, family=quasibinomial))$disp [1] 1.095745 > c(extractAIC(modelB, k=3.84), modelB$dev) [1] 4.0000 279.4645 264.1045 > c(extractAIC(modelB2, k=3.84), modelB2$dev) [1] 4.0000 280.4502 265.0902