Homework Set 17, Due Monday November 27, 2017. --------------------------------------------- Assigned 11/17/2017, due 11/27 16 points ============================== I mentioned in class that when you can do a studentized (Nonparametric or Parametric) Bootstrap 2-sided confidence interval, coverage will generally be higher with the studentized version. This HW Set steps you through a simulation study to test that claim. Throughout the following steps, I will specify a first set of R,B, a0,b0, c0, alpha parameters. But you should vary these parameters by enlarging R and B and trying alternative sets of (a0,b0,c0) parameters (or even the distribution of e_i), as time permits. Step 1. Simulate R iid datasets of the form {(X_i, Y_i), i=1,...,65} , where X_i ~ Uniform[0,1] and given X_i, Y_i = a0 + b0* X_i + e_i , where e_i is iid and independent of (X_i, i=1,...,65) and is distributed as a Logistic(0,c0) random variable. THE OBJECTIVE OF OUR SIMULATIONS IS TO GENERATE BOOTSTRAP 2-SIDED LEVEL 1-alpha CONFIDENCE INTERVALS FOR b0 WITH PROPER COVERAGE PROABILITY (neither larger nor smaller than 1-alpha). Start with a0= -1.5, b0 = 0.6, and c0 = 1, and R=1000, and alpha = 0.2. Step 2. For EACH of the datasets you generate in step 1: (i) Estimate b0 by the least-squares estimator bLS, and note that the variance of bLS based on a single sample (Y_i, i=1,..,65) CONDITIONED ON {X_i: i=1,..,65} is consistently estimated (as sample size gets large, even if you did not know that e_i was logistic distributed) by VBLS = [sum_{i=1}^{65} (Y_i-Ybar - bLS*(X_i-Xbar))^2/63 ] /[sum_{i=1}^{65} (X_i-Xbar)^2] (ii) Generate 2-sided parametric bootstrap confidence intervals for b0 and also 2-sided nonparametric bootstrap confidence intervals for b0 by generating B bootstrap samples of each type, and estimating bootstrap confidence intervals both by the studentized and non-studentized bootstrap bootstrap approaches given in class. Recall that the nonstudentized approach is based on obtaining an empirical reference distribution for the residuals bLS-b0 in terms of the corresponding residuals from the B bootstrap samples, and the studentized approach is based on obtaining an empirical reference distribution for the studentized residuals (bLS-b0)/sqrt(VBLS) in terms of the corresponding studentized residuals from the B bootstrap samples. Start with B=400 or 500, but go up to B=1000 if time permits. Step 3. Over the R repeated Monte Carlo iterations of Step 1, tally FOR EACH OF THE FOUR TYPES OF BOOTSTRAP CONFIDENCE INTERVAL GENERATED IN STEP 2 the proportion of the R datasets for which b0 falls inside the confidence interval, as well as the average length of the interval. Which if any of the four intervals have higher coverage proportion or shorter length than the others ? NOTE THAT THE SIMULATION SAMPLING VARIABILITY MAY MAKE IT HARD TO REACH DEFINITIVE CONCLUSIONS, SO YOU SHOULD ESTIMATE THE STANDARD ERRORS OF COVERAGE OR LENGTH FOR THE CONFIDENCE INTERVALS OF DIFFERENT TYPES, AND PREFERABLY GIVE CONFIDENCE INTERVALS ALSO FOR THE DIFFERENCES IN COVERAGE AND LENGTH THAT YOU GIVE IN ANSWER TO THE FINAL QUESTION ABOVE.