Homework Set 16, Due Wednesday November 22, 2017m 11:59pm. ---------------------------------------------------------- Assigned 11/13/2017, due 11/22 14 points plus 2 points extra-credit ============================== (A) Let a Gamma-distributed simulated dataset be generated as follows: > set.seed(3131) gdat = rgamma(87,3,1.2) ## Consider the statistic S = (X_{(22)} + X_{(66)})/2 , where ## X_{(j)} denotes the j'th order statistic. ## ## Its expected value for the gamma data can be found as > 0.5*(integrate(function(x) qgamma(x,3,1.2)*dbeta(x,22,66),0,1)$val+ integrate(function(x) qgamma(x,3,1.2)*dbeta(x,66,22),0,1)$val) [1] 2.361655 ### Contrast with mean = 3/1.2 = 2.5 and qgamma(0.5,3,1.2) = 2.228384 ### The target value considered as a function of the empirical cdf is > S0 = 0.5*(qgamma(22/87,3,1.2) + qgamma(66/87,3,1.2)) > S0 [1] 2.381622 #### This is the true unknown parameter value for the problem, ### which can be viewed as T(F) for the same operator T ### such that S = T(F_n), where F is the true (Gamma(3,1.2) d.f. ### and F_n with n=87 is the empirical cdf for the original dataset. ### Find 90% two-sided confidence intervals for this target according to ### the following methods (where both Gamma parameters are viewed as ### unknown in the parametric cases). Use n=87 and B=1000 in all ### pseudo-data samples, and estimate CIs by the two methods (a) Nonparametric bootstrap, non-studentized (b) Parametric bootstrap, non-studentized Extra-credit 2 points: can you find a way, using the Delta Method, to calculate a studentized parametric bootstrap in this setting ? (The derivative you use in the delta method could be numerically approximated rather than analytical.) ##--------------------------------------------------------------------------- (B) Consider the dataset "oats", and suppose that the target of inference is the expected yield of the "Marvellous" variety minus the average of the "Victory" and "Golden.Rain" varieties. Let the statistic be the average of the fitted values under the model lm(Y ~ ., data=oats), and let the parametric model be the regression model with iid normal mean 0 errors. Find the 90% two-sided CI for the target using each of the methods (a) Nonparametric bootstrap, non-studentized (b) Parametric bootstrap, non-studentized Use a reasonably large number of bootstrap iterations (B >=1000). NOTE: IN THIS PROBLEM, YOU NEED TO FIT THE MODEL lm(Y ~ ., data=oats) IN ORDER TO ESTIMATE THE PARAMETERS NEEDED FOR THE PARAMETRIC BOOTSTRAP. HOWEVER, THE STATISTIC I HAVE ASKED YOU TO COMPUTE BOTH ON THE ORIGINAL oats DATASET AND THE LATER BOOTSTRAP SAMPLES, WHICH IS (avg of "Marvellous" fitted Y's) - 0.5*(sum of avg of "Victory" fitted Y's and of avg of "GoldenRain" fitted Y's) TURNS OUT IN EACH POSSIBLE SAMPLE TO BE EXACTLY THE SAME AS THE (avg of "Marvellous" Y's) - 0.5*(sum of avg of "Victory" Y's and of avg of "GoldenRain" Y's) DO THE CALCULATION ON THE ORIGINAL SAMPLE TO CONVINCE YOURSELF OF THIS FACT! NOTE THAT IN THE SECOND OF THESE STATISTIC EXPRESSIONS, NO MODEL FITTING IS INVOLVED.