Homework 13, Assigned 10/26/17, due Monday 11/6 in class ======================================================== 16 points (A) Consider the dataset "debt" from package "faraway" on characteristics of 464 people answering questions on a (UK) postal survey regarding attitudes to debt. The response variable of interest is "prodebt". Delete the column "ccarduse" and all rows in which the variable "prodebt" is missing. Then recode the two columns "bankacc" and "bsocacc" into a single variable "savings" using the rule: savings = 1 if either bankacc = 1 or bsocacc = 1 (allowing possible missing values) 0 if both bankacc and bsocacc = 0 or if one is NA and the other is 0 NA if both bankacc = NA and bsocacc = NA Then remove all rows of the remaining data-frame in which any NA values occur. Your resulting data-frame "debt.edt" should have 355 rows and 11 columns. PROBLEM: fit the best linear model you can to this data-frame "debt.edt" with response variable "prodebt" in terms of the other variables in the data frame, with variables including the data-frame columns ("main-effect terms") and their pairwise products ("interaction terms"), with the constraint that an interaction term is included only if both of its main-effect factors is also included. The criterion for "best model" should be "parsimony" (do not include predictors that are very weak unless they are included in highly significant interactions) and "patternless residuals". You may generate your model mainly via stepwise model selection, but then check whether further changes are indicated to remove weak predictors. (B) Fit a logistic regression model to the occurrence rates of esophageal cancer cases among cases and controls in terms of the other (categorical) predictor variables in the dataset "esoph" contained in the standard R "datasets" distribution. Are any of the pairwise interactions between the "agegp", "alcgp" and "tobgp" variables significant predictors of esophageal cancer in these data ?