Homework Set 14, Due Friday November 11, 2016. --------------------------------------------- Assigned 11/2/2016, due 11/11 ============================== Consider the following "empirical Bayes" data setting: X_i ~ Binomial(n_i, p_i) given p_i, where p_i ~ Beta(a,b) for i=1,...,10 , where n_1,...,n_10 and X_1,...,X_10 are given as > nvec [1] 21 30 28 41 52 55 47 52 80 25 > Xvec [1] 4 7 19 22 29 5 3 10 3 4 ADDED EXPLANATION: in this setting, the frequentist would consider a,b as unknown (constant) statistical parameters governing the "random effects" p_i associated with clusters labeled i consistint of n_i Bernoulli (coin-toss) observations. So the a,b, parameters which are shared across clusters are the primary object of estimation for the frequentist, who would then use them to create "best" predictions (based on some loss-function like mean-squared error) for the p_i. FOR THE BAYESIAN: all of the parameters a,b, p_i might be treated as random and unknown (and the a,b parameters might themselves have been given prior densities), but in this problem we assume in the Bayesian parts that the a,b are fixed (somewhat arbitrarily) and treated as known. Then statistical analysis would be used to yield posterior joint densities for p_1,...,p_{10}, given the observed data X_1,...,X_{10}. However, one might ask how sensitive these predictions are to the specific choice of prior-density parameters (a,b). Part (1)(4 pts.) Estimate the parameters a,b by maximum likelihood and find confidence intervals by (i) large-sample normal distribution theory for MLEs (ii) parametric bootstrap (B=5000) using the MLEs as "true" values in the simulation (iii) parametric bootstrap using a range (grid, or distribution) of parameters (a,b) as a way of choosing "true" values randomly in each simulation of 10 new X observations. Comment: UP TO HERE, THE PARAMETER ESTMATION PROBLEM IS ONLY ABOUT (a,b). Part (2)(4 pts.) Estimate the random-effect parameters p_i in a Bayesian framework, by finding their posterior density using a Uniform prior density (a=1, b=1 fixed as though known), obtaining point estimators for the p_i's as posterior expectations from that density, and confidence intervals using the Bayesian credible-interval idea. Comment: THIS IS A SIMPLE BAYESIAN VERSION OF THE ESTIMATION PROBLEM, BUT DONE WITH AN ASSUMED PRIOR Beta(a,b) for the p_i, AND THIS IS A UNIFORM PRIOR WHEN a=b=1. Part (3) (2 pts.) Repeat the steps of part (2) with different parameters (a,b) of your choosing [ but not only the values a=b=1 ] to assess the sensitivity of the Bayesian credible interval to the choice of fixed (a,b). Comment: THIS IS AN EXERCISE IN `BAYESIAN ROBUSTNESS' OR 'SENSITIVITY CHECKING' WITH RESPECT TO THE CHOICE OF PRIOR. Part (4) (2 pts. Extra.) Can you think of any way of evaluating the claim that the a=b=1 choice is a good general way to construct the intervals for p_i's, achieving good general performance (nominal 95% coverage), and relatively short confidence interval length, for a range of different (a,b) choices ? Describe the idea, but do not do it in detail. IN THIS EXTRA-CREDIT PROBLEM PART, YOU ARE ASKED TO THINK ABOUT VERIFYING THE GOOD QUALITY OF BAYESIAN CONFIDENCE-INTERVAL PERFORMANCE IN THE SETTING OF FIXED (i.e., frequentist) PARAMETERS (a0,b0) which may differ from the choice of (a,b) you make to do your Bayesian analysis.