SAS Log on GLM's and Deviance ============================= 3/21/03 Recall login sequence: maclaurin% xhost + dc.umd.edu maclaurin% telnet dc.umd.edu [now comes login with userid and passwd] maclaurin% setenv DISPLAY maclaurin.math.umd.edu:0.0 dc % sas & At some point, in home directory on cluster: dc % mkdir SASproj >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* The little program used last time, reproduced below, creates the permanent SAS dataset SASproj/pbc.sas7bdat. But now (in Splus, further on in this Log, we do it more sensibly, sothat the only records dropped out are the ones which have both DTH=0 and EVTTIME<41 */ data SASstf.PBC (drop = labels1-labels10); infile "pbcdata.asc" ; if _N_ = 1 then input labels1-labels10 $ ; else INPUT obs idnum dth evttime treatgp logbili agevar cirrh cchol albumin ; if obs ne . ; run; /* Also create dataset DthPBC.sas7bdat, with slightly modified code from last time. (Look at LENGTH statement, declaring Surv41 and Trtgp numeric.) */ data SASstf.DthPBC (drop = Dth EVTTIME); LENGTH Surv41 3 Trtgp 3; set SASstf.PBC (keep= Dth EVTTIME LOGBILI AGEVAR TREATGP rename = (AGEVAR=AGEYRS TREATGP=Trtgp)); if Dth=1; if EVTTIME GE 41 then Surv41=1; else Surv41=0; run; /* Now continue with simple GENMOD call for ordinary logistic regression, to match output with Splus */ Proc Genmod data=SASstf.DthPBC; model Surv41 = logbili ageyrs trtgp / dist = binomial link = logit; run; /* Note: despite SAS message that "PROC GENMOD is modeling the probability that Surv41='0'", we obtain exactly the same coefficients as in Splus function glm: */ Analysis Of Parameter Estimates Standard Wald 95% Conf Chi-Sq Param DF Estimate Error Limits P> ChiSq Intercept 1 -5.1989 1.3980 -7.9390 -2.4589 13.83 0.0002 logbili 1 2.7456 0.6842 1.4046 4.0865 16.10 <.0001 AGEYRS 1 0.0071 0.0039 -0.0006 0.0148 3.31 0.0690 Trtgp 1 0.6894 0.4745 -0.2406 1.6195 2.11 0.1463 Scale 0 1.0000 0.0000 1.0000 1.0000 /* Now compare outputs etc with PROC LOGISTIC */ Proc LOGISTIC data=SASstf.DthPBC; title "Logistic Regression"; model Surv41 = logbili ageyrs trtgp / selection=forward DETAILS; run; /* Without the SELECTION=FORWARD option, outputs are similar to (but more extensive than) those under GENMOD. With the DETAILS option, meaningful only in connection with a SELECTION= option, you get the following kind of output */ OUTPUT -------------------------------------------------OUTPUT Forward Selection Procedure Step 0. Intercept entered: ... Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Sq P > ChiSq Intercept 1 0.3732 0.2005 3.4645 0.0627 ... Analysis of Effects Not in the Model Effect DF Chi-Square Pr > ChiSq logbili 1 14.8613 0.0001 AGEYRS 1 0.3092 0.5781 Trtgp 1 0.0243 0.8762 Step 1. Effect logbili entered: ... Model Fit Statistics ... Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 15.9267 1 <.0001 Score 14.8613 1 0.0001 Wald 13.0322 1 0.0003 Analysis of Maximum Likelihood Estimates Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -3.3736 1.0436 10.4494 0.0012 logbili 1 2.1426 0.5935 13.0322 0.0003 Analysis of Effects Not in the Model ... #### You can see that successively fitting the models with each of the possible subsets of variables: Intercept only, Intercept+logbili, Int+logbili+AGEYRS, Int+logbili+AGEYRS+trtgp, would give you a set of deviance values from which you could build an `Analysis of Deviance Table', but here is how you could do it directly in Splus. First we set up the Splus file again, the more sensible way, with records dropped only when EVTTIME<41 and DTH=0. > pbcdth <- read.table(file="/home1/evs/data/ClinTri/pbcdata.asc", header=T) > pbcdth <- pbcdth[pbcdth[,"DTH"]==1 | pbcdth[,"EVTTIME"] >= 41,4:7] pbcdth[,1] <- as.numeric(pbcdth[,1] < 41) names(pbcdth)[1] <- "Surv41" ### 172 records remain > tmpglm <- glm(cbind(Surv41,1-Surv41) ~ ., data=pbcdth, family=binomial) > tmpglm$dev [1] 155.9506 ### This is the deviance for the fitted model > anova(tmpglm) Analysis of Deviance Table ### reflecting variables entered in the ### order indicated Binomial model Response: cbind(Surv41, 1 - Surv41) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 171 223.6958 TREATGP 1 0.27039 170 223.4254 LOGBILI 1 58.65329 169 164.7721 AGEVAR 1 8.82149 168 155.9506 ### Note that the NULL model "Resid.Dev" in this case is ### just -2*logLik for the binomial model: > -2*(sum(pbcdth$Surv41)*log(mean(pbcdth$Surv41)) + (172-sum( pbcdth$Surv41))*log(1-mean(pbcdth$Surv41))) ## = 223.6958 ### Here the `saturated model' has logLik=0. ### So the logLik of each model is -0.5 times is Resid.Dev.