Stat 430, Fall 2001					10/15/01

Topics for Stat 430 Test, along with Sample Problems

INSTRUCTIONS: The in-class test on Friday, October 19, will be 
closed-book, but you should bring (or arrange to share) a calculator,
and you may use a one- or two-sided 8.5" by 11" sheets of notes 
(formulas, summaries of SAS proc's, etc.) which you prepare in advance.

Topic I. Familiarity with SAS PROC's. 

For each of the following, explain how you would produce using SAS 
the statistical or graphical output requested. Give exact PROC's and 
OPTIONS you would use, along with any associated DATA steps 
you would need to prepare them. Give exact SAS statements if you 
can, but clear explanations with references to specific PROC's, 
statements, and OPTIONS will also be good enough. Assume that
you are in the midst of a SAS session and have already declared

libname home "." ;

and that you have a SAS dataset  times.ssd04  in your home 
directory, consisting of 200 records with numerical columns  LIFTIM ,  
AMT,  GRP  where  LIFTIM  is a survival-time in days, AMT  is a 
blood-level of some chemical (rounded to one of 5 discrete levels, and 
GRP = 0 or 1 is a group-indicator respectively of placebo and treatment 
group in the survival study. Here are some sample questions:

a. How would you produce a table showing the number of 
observations with all the possible (two-way) combinations of  values 
for the variables AMT and GRP ?

b. How would you calculate the partial correlation between LIFTIM 
and AMT adjusted (linearly) for the variable GRP, i.e. for group 
membership ?

c. How would you calculate and display the ranges (distance from 
smallest to largest observed value) within each of the 10 cross-
classified groups defined by levels of AMT and GRP ?

d. How would you create side-by-side histograms of LIFTIM 
values by GRP ?

e. How would you check whether the values of LIFTIM are 
approximately normally distributed within each of the two 
treatment-groups ?


II. Understanding of Statistic Definitions and Relationships.

a. For data defined by the following data-step, calculate the value 
and degrees of freedom for the chi-squared test statistic for 
association between row (Risk) and column (Disease) categories:

data  expose ;
     input  Risk $  Disease $ Count ;
    datalines;
              N  Y   200
              N  N   300
              Y  Y   300
              Y  N   200  ;     
     run;
              
b. Calculate the odds ratio and risk ratio for the dataset given in (a).

Suppose that you are given the following outputs from PROC MEANS and 
PROC REG applied to the data  times.ssd04  : 

Variable     N     Mean    Std Dev      Minimum    Maximum

  LIFTIM    200  534.60    495.8         31        1020
     AMT    200    1.659     0.547        0.487       2.743
     GRP    200    0.437     0.496        0           1.0

from PROC REG statement:    MODEL  LIFTIM  AMT  =  GRP ;

                 The REG Procedure ...
              Dependent Variable: LIFTIM

                      Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        359.87           8.536     42.16      <.0001
GRP           1        399.84          91.496      4.37      <.0001

                 Dependent Variable: AMT
              
                      Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1         1.5140        0.0423      35.78      <.0001
GRP           1        -0.3308        0.1203      -2.75      0.0065


from statement   MODEL  LIFTIM = AMT  GRP  ;

                    Parameter Estimates

                 Parameter    Standard
Variable   DF    Estimate       Error    t Value    Pr > |t|

Intercept  1      -333.99      -4.752    -70.49       <.0001
AMT        1       453.20     184.97       2.45       0.0152
GRP        1       267.12      68.67       3.89       0.0001


c. What is the value of the t-test statistic for testing whether the 
average LIFTIM is different in GRP=1 than in GRP=0 ?

d. What is the correlation between AMT and GRP ?

e. What is the partial correlation between  LIFTIM  and  AMT  
adjusted for  GRP ?

f.  What is the variance of the vector of residuals from the linear 
regression of LIFTIM  on  GRP  ? What is the sum of squared 
errors from this regression ?


III. Statistical Interpretations of SAS Output

Using the data in part (a) of II above: 

a. What would you conclude about association of row and column 
categories, if the data were generated by independently sampling
500 individuals with and 500 without a known risk-factor and 
recording whether they later developed the disease under study ? 
Would you expect the Fisher Exact test, the Mantel-Haenszel 
chi-square, and the continuity-adjusted chi-square test to give 
substantially the same results ? (Why or why not ?)

b. For the dataset in part I (also II(b)), suppose that the output
from
	PROG REG data=home.times ;
              MODEL   LIFTIM = AMT  ;  run;
includes:


                 The REG Procedure ...
              Dependent Variable: LIFTIM

                      Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1       -367.63           7.772    -47.3      <.0001
AMT           1        543.84         120.32      4.52      <.0001


(i) Interpret the T-statistic and p-value for the AMT coefficient. 
(ii) Give the assumptions which the data ought to satisfy for these 
conclusions to be valid.
(iii) Looking at the outputs under the second MODEL statement
in part II(b), is there any indication that some part of the assumptions 
you gave in (ii) is NOT valid ? Explain.
(iv) What SAS analyses could you perform on the dataset  
times.ssd04 to check some other aspect(s) of the assumptions in (ii) ?