Homework Set 17, Due Friday December 2, 2016. ------------------------------------------------ Assigned 11/19/2016, due 11/30 extended to Friday 12/2 ============================== (A) Recall that at the end of HW14 we had the "Extra-credit" portion which asked you how you might evaluate the quality (length and coverage) of Bayesian credible intervals and compare them with a frequentist method. In my solution to that HW14 problem set, I suggested that the frequentist method to use is not so obvious, and I suggested that a natural frequentist method to try is parametric-bootstrap. The proposed method is this: since each X_i ~ Binom( n_i, p_i) , even though p_i is drawn randomly from a prior, we could draw a large number B (say 1000) of parametric-bootstrap X_i^{*(b)}'s as Binom(n_i, p.hat_i), where p.hat_i = X_i/n_i, and then find p.hat_i^{*(b)} = X_i^{*(b)}/n_i and create a bootstrap distribution (eg a histogram) of the values p.hat_i^{*(b)}-p.hat_i = z_{i,b}, which yields a bootstrap-based CI for p_i = p.hat_i - (.975 quantile, .025 quantile) where these quantiles are estimated from the values (z_{i,b}, b=1,...,B). Code a function to generate bootstrap CI's by this method along with the corresponding Bayesian credible intervals. Then use your function to implement a small simulation study (say N=1000 to 5000) in which you generate N batches of (p_i,X_i, i=1,...,10) --- all with the same n_i's and (a,b) values--- and find the Coverage (indicator equal to 1 if the CI for each batch contains p_i) and Length of the Bayes vs bootstrap CIs. Report and interpret the results of your simulation study. (B) Run the forward stepwise selection with option k=2 in fitting a glm "Poisson regression" model for the number of Claims in the dataset "Insurance" which you can find in the library "MASS", using the log number of policy holders as an "offset", and allowing all variables in the dataset and their interactions up to second order as possible variables in the regression model. Report your final model, showing the fitted coefficients and their estimated standard errors, and show using other R functions how to reproduce the null deviance, final-model residual deviance, along with the ML coefficient estimates and standard errors directly by maximizing a likelihood function which you define.