Homework Set 14, Due Friday November 11, 2016.
---------------------------------------------

Assigned 11/2/2016, due 11/11
==============================

Consider the following "empirical Bayes" data setting: 

X_i ~ Binomial(n_i, p_i)  given  p_i,  where   p_i ~ Beta(a,b)

for i=1,...,10 , where  n_1,...,n_10 and X_1,...,X_10 are given as 

> nvec
 [1] 21 30 28 41 52 55 47 52 80 25

> Xvec
 [1]  4  7 19 22 29  5  3 10  3  4


ADDED EXPLANATION: in this setting, the frequentist would consider a,b as unknown
(constant) statistical parameters governing the "random effects" p_i associated 
with clusters labeled i  consistint of n_i Bernoulli (coin-toss) observations.
So the a,b, parameters which are shared across clusters are the primary object 
of estimation for the frequentist, who would then use them to create "best" 
predictions (based on some loss-function like mean-squared error) for the p_i.

FOR THE BAYESIAN: all of the parameters a,b, p_i might be treated as random 
and unknown (and the a,b parameters might themselves have been given prior 
densities), but in this problem we assume in the Bayesian parts that the a,b are 
fixed (somewhat arbitrarily) and treated as known. Then statistical analysis 
would be used to yield posterior joint densities for p_1,...,p_{10}, given the 
observed data X_1,...,X_{10}. However, one might ask how sensitive these 
predictions are to the specific choice of prior-density parameters (a,b).


Part (1)(4 pts.) Estimate  the parameters   a,b  by maximum likelihood 
                 and find confidence intervals by

          (i) large-sample normal distribution theory for MLEs

          (ii) parametric bootstrap (B=5000) using the MLEs as 
"true" values in the simulation

          (iii) parametric bootstrap using a range (grid, or 
distribution) of parameters (a,b) as a way of choosing "true" 
values randomly in each simulation of 10 new X observations.

Comment: UP TO HERE, THE PARAMETER ESTMATION PROBLEM IS ONLY ABOUT (a,b).

Part (2)(4 pts.) Estimate the random-effect parameters p_i in a Bayesian 
framework, by finding their posterior density using a Uniform prior density  
(a=1, b=1 fixed as though known), obtaining point estimators for the p_i's 
as posterior expectations from that density, and confidence intervals using 
the Bayesian credible-interval idea.

Comment: THIS IS A SIMPLE BAYESIAN VERSION OF THE ESTIMATION PROBLEM, BUT 
DONE WITH AN ASSUMED PRIOR  Beta(a,b) for the p_i, AND THIS IS A UNIFORM 
PRIOR WHEN a=b=1.


Part (3) (2 pts.) Repeat the steps of part (2) with different parameters (a,b) 
of your choosing [ but not only the values a=b=1 ] to assess the sensitivity 
of the Bayesian credible interval to the choice of fixed (a,b).

Comment: THIS IS AN EXERCISE IN `BAYESIAN ROBUSTNESS' OR 'SENSITIVITY CHECKING'
WITH RESPECT TO THE  CHOICE OF PRIOR. 


Part (4) (2 pts. Extra.) Can you think of any way of evaluating the claim that the 
a=b=1 choice is a good general way to construct the intervals for p_i's, achieving 
good general performance (nominal 95% coverage), and relatively short confidence 
interval length, for a range of different (a,b) choices ?

Describe the idea, but do not do it in detail.

IN THIS EXTRA-CREDIT PROBLEM PART, YOU ARE ASKED TO THINK ABOUT VERIFYING THE GOOD
QUALITY OF BAYESIAN CONFIDENCE-INTERVAL PERFORMANCE IN THE SETTING OF FIXED  (i.e., 
frequentist) PARAMETERS (a0,b0) which may differ from the choice of (a,b) you 
make to do your Bayesian analysis.