Homework Set 17, Due Friday December 2, 2016.
------------------------------------------------

Assigned 11/19/2016, due 11/30 extended to Friday 12/2
==============================

(A) Recall that at the end of HW14 we had the "Extra-credit" portion 
which asked you how you might evaluate the quality (length and 
coverage) of Bayesian credible intervals and compare them with 
a frequentist method.
   In my solution to that HW14 problem set, I suggested that the 
frequentist method to use is not so obvious, and I suggested that
a natural frequentist method to try is parametric-bootstrap. The 
proposed method is this:
   since each X_i ~ Binom( n_i, p_i) , even though p_i is  drawn 
randomly from a prior, we could draw a large number B (say 1000) 
of parametric-bootstrap  X_i^{*(b)}'s as Binom(n_i, p.hat_i), 
where  p.hat_i = X_i/n_i,
and then find p.hat_i^{*(b)} = X_i^{*(b)}/n_i and create a bootstrap 
distribution (eg a histogram) of the values p.hat_i^{*(b)}-p.hat_i = z_{i,b},
which yields a bootstrap-based CI for p_i = p.hat_i - (.975 quantile, .025 quantile)
where these quantiles are estimated from the values (z_{i,b}, b=1,...,B).

Code a function to generate bootstrap CI's by this method along with the 
corresponding Bayesian credible intervals. Then use your function to implement 
a small simulation study (say N=1000 to 5000) in which you generate N batches of 
(p_i,X_i, i=1,...,10) --- all with the same  n_i's and (a,b) values--- and 
find the Coverage (indicator equal to 1 if the CI for each batch contains p_i)
and Length of the Bayes vs bootstrap CIs. Report and interpret the results of 
your simulation study.


(B) Run the forward stepwise selection with option k=2 in fitting a glm 
"Poisson regression" model for the number of Claims in the dataset 
"Insurance" which you can find in the library "MASS", using the log number 
of policy holders as an "offset", and allowing all variables in the dataset 
and their interactions up to second order as possible variables in the 
regression model.

Report your final model, showing the fitted coefficients and their estimated 
standard errors, and show using other R functions how to reproduce the null 
deviance, final-model residual deviance, along with the ML coefficient 
estimates and standard errors directly by maximizing a likelihood 
function which you define.