Homework 22. Due Wednesday, April 23, 2008. =========================================== Using the R-supplied dataset "infert" concerning presence or absence of infertility in women as a function of numbers of spontaneous or induced abortions, perform the following steps: (a) Find the best logistic-regression model you can, starting from the base model "model1" suggested in the "help(infert)" documentation. I suggest using forward selection, with k > 3.2. (b) Show that the "residuals" in the fitted glm object are obtained as (y_i-fitted_i)/(fitted_i*(1-fitted_i)) With "age" NOT included as a predictor variable in your logistic regression model, plot the residuals versus age. Does this suggest that age should be included as a predictor ? Display the difference in the extreme residuals with age in the model and with age not in the model. (c) Find the Dispersion parameter estimate for your best model, two ways: (i) by re-fitting the model using "quasibinomial" family, and (ii) by the formula given in class. Make sure that these agree. Does your dispersion value suggest any issues concerning dependence (or common random effects) in the observations ? (d) Use any measure of model-fit you like to answer whether there is any improvement in the model if it is re-fitted with "probit" link.