Homework 22. Due Wednesday, April 23, 2008.
===========================================

Using the R-supplied dataset  "infert"  concerning
presence or absence of infertility in women as a 
function of numbers of spontaneous or induced abortions,
perform the following steps:

(a) Find the best logistic-regression model you can,
starting from the base model "model1" suggested in
the "help(infert)" documentation. I suggest using 
forward selection, with k > 3.2.

(b) Show that the "residuals" in the fitted glm
object are obtained as  
     (y_i-fitted_i)/(fitted_i*(1-fitted_i))
With "age" NOT included as a predictor variable in 
your logistic regression model, plot the residuals
versus age. Does this suggest that age should be 
included as a predictor ?

Display the difference in the extreme residuals with 
age in the model and with age not in the model.

(c) Find the Dispersion parameter estimate for your 
best model, two ways: (i) by re-fitting the model
using "quasibinomial" family, and (ii) by the formula
given in class. Make sure that these agree.
Does your dispersion value suggest any issues 
concerning dependence (or common random effects) in 
the observations ?

(d) Use any measure of model-fit you like to answer 
whether there is any improvement in the model if it
is re-fitted with "probit" link.