HW 20 Stat 705  Fall 2017

Assigned Monday 12/4/17        DUE Tuesday 12/12/17 11:59pm, worth 16 points

(A) Read in the dataset "fat" from R package "faraway". This is a dataset on body fat 
and related measurements in men. USE ONLY THE "height" AND "weight" COLUMNS, AND 
REMOVE  TWO OUTLYING OBSERVATIONS, NUMBERS 39 AND 42 (one anomalously short and one 
anomalously heavy).

Taking x=height and y=weight, consider the linear model

y[i] = a + b*x[i] + epsilon[i]   ,      epsilon[i] ~ Normal with mean 0, variance sig^2

Assume a prior density with parameters  a,b,sig  independent where
            (a,b)      ~    normal with mean     (6, 2.5)  and  sd's  (2, 0.5)
	    tau = 1/sig^2  ~    Gamma(4, 100)

##-----------------------------------------------------------------------------------
#### NOTE: in the problem as originally posted this was Gamma(30, 0.033), which makes 
###  the distribution sig^2 very concentrated at small values, istead of being very 
###  spread out, as I intended.
##-----------------------------------------------------------------------------------
		
Use a GIBBS SAMPLER TO sample 10,000    (a,b,tau)  triples from the posterior density 
given the fat[-c(39,42), c("height","weight")]  data, after a burn-in sample of 5000 
triples.

Use your sampled results to do the following:

   (i) plot smoothed posterior density estimates of  a, b, and sig  separately;
   (ii) assess (visually) whether you think that each of the three posterior densities 
        you found in (i) is well approximated by a normal density;
   (iii) find the posterior median for each of the three parameters; 
   (iv) find  90%  Bayesian credible intervals for each of the three parameters
   (v) find the posterior probability that simultaneously a and b BOTH lie within 
        their 90% (frequentist, least-squares) confidence intervals calculated from 
		the original "fat" dataset.
		
(B) After completing (A), write a function to generate (as many times as desired, 
independently) samples of size 250 of y's satisfying the conditional linear model 
specified above with the same underlying data-generating mechanism as the 
"height" values in the "fat" dataset.

Test your function by comparing -- in any way you choose to display the results --
four newly generated datasets of y's of size 250, all with the same fixed set 
of x[1,...,x[250].