Homework Problem 26, Due Monday May 10 or Wednesday May 12.
----------------------------------------------------------

Access the data "Rubber" from the MASS library within R or
Splus6.0. From either platform, you must first issue the command  

> library(MASS)

to place the MASS (Venables & Ripley) datasets into your 
search-path. This dataset has 3 variables: loss (the amount of rubber
wear, the response variable) and two predictors, "hard" and "tens".

The objective of this exercise is to compare the predictive 
success of different methods of "predicting" above-median 
rubber-loss (ie, loss > 165).

(i) Develop three methods of predicting above-median rubber loss from
the variables "hard" and "tens":
--- using a linear-regression model (normal errors)
--- using a logistic regression model
--- using a nonparametric regression model (kernel density estimator
with a single bandwidth b=40) for "loss" as a function of the 
linear-regression fitted linear combination of "hard" and "tens". 

You should code a function which does the model-fitting and defines 
predictions based on a dataset of the same structure as "Rubber", but
with size n between 25 and 30. Note that each prediction method is 
an algorithm mapping the dataset to a logical vector of the same 
size n (where T corresponds to "> 165" and F to "<= 165"). For
convenience, you should probably write your function to calculate the
numbers or proportions of correct predictions on a test dataset input
to the same function.

(ii) Do a small cross-validation study (say, of 1000 replications), 
by repeatedly leaving out 5 observations chosen at random from the
original dataset of 30, designed to estimate the accuracy of
prediction of [loss > 165] by each of the three prediction methods 
you developed in (i).

(iii) Do a small bootstrap study (preferably of many more than 1000
replications) designed to find a 95% confidence interval for the 
probability  P(loss > 165 | hard, tens)  by each of your three methods
in (i), for several (say, the first 5) of the (hard, tens)
combinations actually occurring in the data. There are a few different
ways to do such a study. Do the study TWO DIFFERENT WAYS chosen from
among the following: 
     (a) bootstrap the triples (loss, hard, tens) directly (ie,
directly sample with replacement from the set of 30 triples); OR
     (b) do a parametric bootstrap of the data, by simulating 
with replacement from only the pairs (hard, tens) and generating the
additive regression errors from the normal linear regression model
with parameters fitted to the dataset of all 30 points; OR
     (c) form the residuals from the linear regression model (fitted
to the original dataset), and bootstrap them (ie repeatedly select
with replacement), each time adding them back to the orginal
linear-regression predictors to get a `pseudo-data' sample 
(pseudo-loss, hard, tens) of size 30 on which you can check the
behavior of your prediction methods in (i).