SAS Log on GLM's and Deviance
============================= 3/21/03

Recall login sequence:

maclaurin% xhost + dc.umd.edu
maclaurin% telnet dc.umd.edu
     [now comes login with userid and passwd]
maclaurin% setenv DISPLAY maclaurin.math.umd.edu:0.0
dc % sas & 


At some point, in home directory on cluster:
dc % mkdir SASproj

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
/* The little program used last time, reproduced below,
creates the permanent SAS dataset SASproj/pbc.sas7bdat.
But now (in Splus, further on in this Log, we do it more 
sensibly, sothat the only records dropped out are the ones 
which have both DTH=0 and EVTTIME<41                   */

 data SASstf.PBC (drop = labels1-labels10);
  infile "pbcdata.asc" ;
     if _N_ = 1 then input labels1-labels10 $ ; 
       else INPUT obs idnum dth evttime treatgp logbili
              agevar cirrh cchol albumin ;
     if obs ne . ;
    run;

/* Also create dataset DthPBC.sas7bdat, with slightly 
modified code from last time. (Look at LENGTH statement,
  declaring Surv41 and Trtgp numeric.)  */

 data SASstf.DthPBC (drop = Dth EVTTIME);
   LENGTH Surv41 3 Trtgp 3;
   set SASstf.PBC (keep= Dth EVTTIME LOGBILI AGEVAR TREATGP
         rename = (AGEVAR=AGEYRS TREATGP=Trtgp));
   if Dth=1;
   if EVTTIME GE 41 then Surv41=1;
      else Surv41=0;
   run;
/* Now continue with simple GENMOD call for ordinary 
   logistic regression, to match output with Splus */ 

Proc Genmod data=SASstf.DthPBC;
   model Surv41 = logbili ageyrs trtgp / dist = binomial
                                         link = logit;
  run;

/* Note: despite SAS message that "PROC GENMOD is modeling 
the probability that Surv41='0'", we obtain exactly the 
same coefficients as in Splus function glm: */

        Analysis Of Parameter Estimates

                      Standard     Wald 95% Conf     Chi-Sq
Param     DF   Estimate   Error          Limits              P> ChiSq

Intercept  1   -5.1989    1.3980  -7.9390  -2.4589   13.83    0.0002
logbili    1    2.7456    0.6842   1.4046   4.0865   16.10    <.0001
AGEYRS     1    0.0071    0.0039  -0.0006   0.0148    3.31    0.0690
Trtgp      1    0.6894    0.4745  -0.2406   1.6195    2.11    0.1463
Scale      0    1.0000    0.0000   1.0000   1.0000

/* Now compare outputs etc with PROC LOGISTIC */

Proc LOGISTIC data=SASstf.DthPBC;
title "Logistic Regression";
   model Surv41 = logbili ageyrs trtgp / selection=forward DETAILS;
  run;

/* Without the SELECTION=FORWARD option, outputs are similar to
  (but more extensive than) those under GENMOD. With the DETAILS
  option, meaningful only in connection with a SELECTION= option, you
  get the following kind of output */

OUTPUT -------------------------------------------------OUTPUT
               Forward Selection Procedure

Step  0. Intercept entered:
...             
           Analysis of Maximum Likelihood Estimates

                            Standard    Wald
Parameter   DF   Estimate    Error     Chi-Sq    P > ChiSq
Intercept   1     0.3732    0.2005      3.4645    0.0627

...
           Analysis of Effects Not in the Model
                                                 
   Effect       DF    Chi-Square    Pr > ChiSq
   logbili       1       14.8613        0.0001
   AGEYRS        1        0.3092        0.5781
   Trtgp         1        0.0243        0.8762


Step  1. Effect logbili entered:
...
  Model Fit Statistics ...

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        15.9267        1         <.0001
Score                   14.8613        1         0.0001
Wald                    13.0322        1         0.0003


        Analysis of Maximum Likelihood Estimates

Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -3.3736      1.0436       10.4494        0.0012
logbili       1      2.1426      0.5935       13.0322        0.0003

Analysis of Effects Not in the Model
...

####
You can see that successively fitting the models with each of the
possible subsets of variables: Intercept only, Intercept+logbili, 
Int+logbili+AGEYRS, Int+logbili+AGEYRS+trtgp, would give you a set of
deviance values from which you could build an `Analysis of Deviance
Table', but here is how you could do it directly in Splus.
                    
First we set up the Splus file again, the more sensible way, with
records dropped only when EVTTIME<41 and DTH=0.

> pbcdth <- read.table(file="/home1/evs/data/ClinTri/pbcdata.asc",
      header=T)
> pbcdth <- pbcdth[pbcdth[,"DTH"]==1 | pbcdth[,"EVTTIME"] >= 41,4:7]  
  pbcdth[,1] <- as.numeric(pbcdth[,1] < 41)
  names(pbcdth)[1] <- "Surv41"    ### 172 records remain
> tmpglm <- glm(cbind(Surv41,1-Surv41) ~ ., data=pbcdth, family=binomial)
> tmpglm$dev
[1] 155.9506    ### This is the deviance for the fitted model

> anova(tmpglm)
Analysis of Deviance Table   ### reflecting variables entered in the
			     ### order indicated
Binomial model

Response: cbind(Surv41, 1 - Surv41)

Terms added sequentially (first to last)
        Df Deviance Resid. Df Resid. Dev 
   NULL                   171   223.6958
TREATGP  1  0.27039       170   223.4254
LOGBILI  1 58.65329       169   164.7721
 AGEVAR  1  8.82149       168   155.9506
 ### Note that the NULL model "Resid.Dev" in this case is
### just -2*logLik for the binomial model:
> -2*(sum(pbcdth$Surv41)*log(mean(pbcdth$Surv41)) + (172-sum(
  pbcdth$Surv41))*log(1-mean(pbcdth$Surv41))) ## = 223.6958
### Here the `saturated model' has logLik=0.
### So the logLik of each model is -0.5 times is Resid.Dev.