Homework Set 17, Due Monday November 27, 2017.
---------------------------------------------

Assigned 11/17/2017, due 11/27               16 points 
==============================

I mentioned in class that when you can do a studentized (Nonparametric or Parametric) Bootstrap 
2-sided confidence interval, coverage will generally be higher with the studentized version. 
This HW Set steps you through a simulation study to test that claim. Throughout the following 
steps, I will specify a first set of R,B, a0,b0, c0, alpha  parameters. But you should vary these 
parameters by enlarging R and B and trying alternative sets of (a0,b0,c0) parameters (or even 
the distribution of e_i), as time permits.

Step 1. Simulate R iid datasets of the form {(X_i, Y_i), i=1,...,65} , where 

        X_i ~ Uniform[0,1]   and given X_i,   Y_i =  a0 + b0* X_i + e_i ,

where  e_i  is iid and independent of  (X_i, i=1,...,65) and is distributed as 
a Logistic(0,c0) random variable.

THE OBJECTIVE OF OUR SIMULATIONS IS TO GENERATE BOOTSTRAP 2-SIDED LEVEL 1-alpha CONFIDENCE 
INTERVALS FOR b0  WITH PROPER COVERAGE PROABILITY  (neither larger nor smaller than 1-alpha).

Start with a0= -1.5,  b0 = 0.6, and  c0 = 1, and R=1000, and alpha = 0.2.

Step 2. For EACH of the datasets you generate in step 1:

          (i) Estimate  b0  by the least-squares estimator  bLS,  and note that the variance
       of bLS  based on a single sample (Y_i, i=1,..,65) CONDITIONED ON {X_i: i=1,..,65} 
       is consistently estimated (as sample size gets large, even if you did not know that 
       e_i was logistic distributed) by  
     VBLS = [sum_{i=1}^{65} (Y_i-Ybar - bLS*(X_i-Xbar))^2/63 ] /[sum_{i=1}^{65} (X_i-Xbar)^2]

          (ii) Generate 2-sided parametric bootstrap confidence intervals for b0 and also 
       2-sided nonparametric bootstrap confidence intervals for b0 by generating B bootstrap 
       samples of each type, and estimating bootstrap confidence intervals both by the 
       studentized and non-studentized bootstrap bootstrap approaches given in class.
       
       Recall that the nonstudentized approach is based on obtaining an empirical reference
       distribution for the residuals bLS-b0  in terms of the corresponding residuals from the 
       B bootstrap samples, and the studentized approach is based on obtaining an empirical 
       reference distribution for the studentized residuals (bLS-b0)/sqrt(VBLS)  in terms of 
       the corresponding studentized residuals from the B bootstrap samples.

    Start with B=400 or 500, but go up to B=1000 if time permits.

Step 3. Over the R repeated Monte Carlo iterations of Step 1, tally FOR EACH OF THE FOUR TYPES 
       OF BOOTSTRAP CONFIDENCE INTERVAL GENERATED IN STEP 2 the proportion of the R datasets 
       for which  b0 falls inside the confidence interval, as well as the average length of 
       the interval.

   Which if any of the four intervals have higher coverage proportion or shorter length than 
   the others ?
 
NOTE THAT THE SIMULATION SAMPLING VARIABILITY MAY MAKE IT HARD TO REACH DEFINITIVE CONCLUSIONS,
SO YOU SHOULD ESTIMATE THE STANDARD ERRORS OF COVERAGE OR LENGTH FOR THE CONFIDENCE INTERVALS 
OF DIFFERENT TYPES, AND PREFERABLY GIVE CONFIDENCE INTERVALS ALSO FOR THE DIFFERENCES IN COVERAGE 
AND LENGTH THAT YOU GIVE IN ANSWER TO THE FINAL QUESTION ABOVE.