HW 20 Stat 705 Fall 2017 Assigned Monday 12/4/17 DUE Tuesday 12/12/17 11:59pm, worth 16 points (A) Read in the dataset "fat" from R package "faraway". This is a dataset on body fat and related measurements in men. USE ONLY THE "height" AND "weight" COLUMNS, AND REMOVE TWO OUTLYING OBSERVATIONS, NUMBERS 39 AND 42 (one anomalously short and one anomalously heavy). Taking x=height and y=weight, consider the linear model y[i] = a + b*x[i] + epsilon[i] , epsilon[i] ~ Normal with mean 0, variance sig^2 Assume a prior density with parameters a,b,sig independent where (a,b) ~ normal with mean (6, 2.5) and sd's (2, 0.5) tau = 1/sig^2 ~ Gamma(4, 100) ##----------------------------------------------------------------------------------- #### NOTE: in the problem as originally posted this was Gamma(30, 0.033), which makes ### the distribution sig^2 very concentrated at small values, istead of being very ### spread out, as I intended. ##----------------------------------------------------------------------------------- Use a GIBBS SAMPLER TO sample 10,000 (a,b,tau) triples from the posterior density given the fat[-c(39,42), c("height","weight")] data, after a burn-in sample of 5000 triples. Use your sampled results to do the following: (i) plot smoothed posterior density estimates of a, b, and sig separately; (ii) assess (visually) whether you think that each of the three posterior densities you found in (i) is well approximated by a normal density; (iii) find the posterior median for each of the three parameters; (iv) find 90% Bayesian credible intervals for each of the three parameters (v) find the posterior probability that simultaneously a and b BOTH lie within their 90% (frequentist, least-squares) confidence intervals calculated from the original "fat" dataset. (B) After completing (A), write a function to generate (as many times as desired, independently) samples of size 250 of y's satisfying the conditional linear model specified above with the same underlying data-generating mechanism as the "height" values in the "fat" dataset. Test your function by comparing -- in any way you choose to display the results -- four newly generated datasets of y's of size 250, all with the same fixed set of x[1,...,x[250].