March 20, 1998, Spring 1998 (STAT 798C) March 31, 2001, Spring 2001 (STAT 798L) October 20, Fall 2010, STAT 705 October 7, Fall 2011, STAT 705 October 15, Fall 2012, STAT 705 EM/STAT798C/glm.LAshumway, S-plus stat/STAT705/glm.LAshumway, R (AIC, ts.plot different) 0. GLM Analysis of Los Angeles Mortality Data 1970-1979 A. Introduction to lm() and glm(). B. Comparison between Poisson, Negative Binomial and Quasi-Likelihood ---------------------------------------------- Nice way to plot two ts M1 <- glm(y ~ y1 + y2, family = quasi(link = "log", variance = "mu")) ts.plot(ts(M1$y),ts(M1$fitted.values),xlab="Weeks",ylab="Counts",type="l",col=c(1,2)) legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-')) ----------------------------------------------------------------- GLM Analysis of Los Angeles Mortality Data 1970-1979 ========================================================= LA Pollution-Mortality Study: 1970-1979, 508 observations, 6-day spacing. Weekly FILTERED data. The data were lowpass filtered, filtering out frequencies above 0.1 cycles per day. Mortality: (1) Mrt: Total Mortality (2) Rsp: Respiratory Mortality (3) Crd: Cardiovascular Mortality Weather: (4) Tmp: Temperature (5) Hum: Relative Humidity Pollution: (6) Crb: Carbon Monoxide (7) Slf: Sulfur Dioxideglm.LAshumway (8) Nit: Nitrogen Dioxide (9) Hdr: Hydrocarbons (10) Ozn: Ozone (11) Par: Particulates Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality fluctuations in Los Angeles as functions of pollution and weather effects. Environmental Research, 45, 224-241. (RA421.E5) 1. Reading in the data ----------------------- Data form as received from Shumway Sat, 11 Feb 1995 (508 by 11): 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 173.60 8.27 95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02 .................. A <- matrix(scan("LAshumway"), byrow=T,ncol=11) Read 5588 items > A[1:5,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 [2,] 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 [3,] 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 [4,] 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 [5,] 173.60 8.27 95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02 ##Better: Give col names using dimnames. > A <- matrix(scan("LAshumway"), byrow=T,ncol=11, dimnames=list(NULL,c('Mrt','Rsp','Crd','Tmp','Hum','Crb', 'Slf','Nit','Hdr','Ozn','Par'))) ###OK!!! > A[1:5,] Mrt Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par [1,] 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 [2,] 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 [3,] 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 [4,] 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 [5,] 173.60 8.27 95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02 Or better: > head(A) Mrt Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par [1,] 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 [2,] 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 [3,] 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 [4,] 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 [5,] 173.60 8.27 95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02 [6,] 183.73 7.55 95.98 67.88 66.78 7.99 2.57 10.11 48.61 8.80 44.01 > is.matrix(A) [1] TRUE > is.data.frame(A) [1] FALSE > A[1:5,'Mrt'] [1] 183.63 191.05 180.09 184.67 173.60 ###OK! > A$Mrt[1:5] Error in A$Mrt: $ operator is invalid for atomic vectors ##Not helpful. > LA <- data.frame(A) ###Change into data frame > is.data.frame(LA) [1] TRUE > dim(LA) [1] 508 11 > LA[1:5,] ##Observe the different row names. Mrt Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par 1 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 2 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 3 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 4 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 5 173.60 8.27 95.85 74.25 34.80 10.57 3.39 11.90 48.53 9.15 66.02 > NOTE: Good to attach(LA). Then can use directly ts.plot(Mrt) etc. > LA[1:5,'Mrt'] [1] 183.63 191.05 180.09 184.67 173.60 ###OK! > LA$Mrt[1:5] [1] 183.63 191.05 180.09 184.67 173.60 ###OK! > mean(LA$Mrt) [1] 169.0482 > var(LA$Mrt) ##Var is larger: 201.1539/169.0482 = 1.18992 [1] 201.1539 Poisson reg is reasonable as the ratio is not far from 1. > summary(LA$Mrt) Min. 1st Qu. Median Mean 3rd Qu. Max. 142.1 159.6 166.7 169 176.4 231.7 2. Looking at the data ----------------------- ##Relationships between the variables > pairs(LA) ##FIG 1 ##The corresponding spectral estimates par(mfrow=c(3,3), oma=c(0,0,5,0)) ts.plot(LA$Mrt) ###Same as tsplot(A[,'Mrt']) mtext("Mrt=Total Mortality", line=0,cex=1.2) PR.Mrt <- spectrum(LA$Mrt) ##Raw Periodogram SP.Mrt <- spectrum(LA$Mrt, spans=c(3,5,7)) ##Smoothed Periodogram ts.plot(LA$Crb) mtext("Crb=Carbon Monoxide", line=0,cex=1.2) PR.Crb <- spectrum(LA$Crb) SP.Crb <- spectrum(LA$Crb, spans=c(3,5,7)) ts.plot(LA$Par) mtext("Par=Particulates", line=0,cex=1.2) PR.Par <- spectrum(LA$Par) SP.Par <- spectrum(LA$Par, spans=c(3,5,7)) ##FIG 2 mtext("LA Mortality and Covariate Time Series", outer=T, cex=1.5) ts.plot(LA$Mrt) ###Same as tsplot(A[,'Mrt']) mtext("Mrt=Total Mortality", line=0,cex=1.2) PR.Mrt <- spectrum(LA$Mrt) ##Raw Periodogram SP.Mrt <- spectrum(LA$Mrt, spans=c(3,5,7)) ##Smoothed Periodogram ts.plot(LA$Tmp) mtext("Par=Particulates", line=0,cex=1.2) PR.Par <- spectrum(LA$Tmp) SP.Par <- spectrum(LA$Tmp, spans=c(3,5,7)) ts.plot(LA$Crb) mtext("Crb=Carbon Monoxide", line=0,cex=1.2) PR.Crb <- spectrum(LA$Crb) SP.Crb <- spectrum(LA$Crb, spans=c(3,5,7)) mtext("LA Mortality and Covariate Time Series", outer=T, cex=1.5) ###Find the index of the maximum periodogram ordinate > names(SP.Mrt) [1] "freq" "spec" "coh" "phase" "spans" "filter" [7] "df" "bandwidth" "n.used" "series" "method" "taper" [13] "pad" "detrend" "demean" > length(SP.Mrt$freq) [1] 256 > length(SP.Mrt$spec) [1] 256 > min(SP.Mrt$freq) [1] 0.001953125 ###Almost 0. > max(SP.Mrt$freq) [1] 0.5 > (1:length(SP.Mrt$spec))[SP.Mrt$spec==max(SP.Mrt$spec)] [1] 10 > SP.Mrt$freq[10] [1] 0.01953125 > (1:length(SP.Crb$spec))[SP.Crb$spec==max(SP.Crb$spec)] [1] 10 > SP.Crb$freq[10] [1] 0.01953125 > (1:length(SP.Par$spec))[SP.Par$spec==max(SP.Par$spec)] [1] 10 > SP.Par$freq[10] [1] 0.01953125 #Period: > 1/0.01953125 [1] 51.2 ###Period measured in the time interval used. That is 51.2 such intervals. > (1/0.01953125)*7 [1] 358.4 days ###<--- About one year. 3. Poisson Regression: Use canonical link throughout Non-integer data!!! Get 50 warnings. ------------------------------------------------------ > class(LA) [1] "data.frame" ##Need this Response : Mrt Covariates: Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par PoisReg.All <- glm(formula = Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=poisson, data=LA) OR: PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=poisson, data=LA) There were 50 or more warnings (use warnings() to see the first 50) Warning messages: 1: In dpois(y, mu, log = TRUE) : non-integer x = 183.630000 2: In dpois(y, mu, log = TRUE) : non-integer x = 191.050000 3: In dpois(y, mu, log = TRUE) : non-integer x = 180.090000 4: In dpois(y, mu, log = TRUE) : non-integer x = 184.670000 5: In dpois(y, mu, log = TRUE) : non-integer x = 173.600000 ............................................................ ............................................................ 47: In dpois(y, mu, log = TRUE) : non-integer x = 203.750000 48: In dpois(y, mu, log = TRUE) : non-integer x = 202.200000 49: In dpois(y, mu, log = TRUE) : non-integer x = 188.990000 50: In dpois(y, mu, log = TRUE) : non-integer x = 190.970000 > names(PoisReg.All) [1] "coefficients" "residuals" "fitted.values" [4] "effects" "R" "rank" [7] "qr" "family" "linear.predictors" [10] "deviance" "aic" "null.deviance" [13] "iter" "weights" "prior.weights" [16] "df.residual" "df.null" "y" [19] "converged" "boundary" "model" [22] "call" "formula" "terms" [25] "data" "offset" "control" [28] "method" "contrasts" "xlevels" > PoisReg.All Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, family = poisson, data = LA) Coefficients: (Intercept) Rsp Crd Tmp Hum Crb 4.485e+00 6.005e-03 6.380e-03 -1.204e-04 -7.808e-05 4.958e-04 Slf Nit Hdr Ozn Par -7.446e-03 -4.668e-04 7.655e-04 2.230e-03 1.080e-04 Degrees of Freedom: 507 Total (i.e. Null); 497 Residual Null Deviance: 588.6 Residual Deviance: 57.85 AIC: Inf > summary(PoisReg.All) Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, family = poisson, data = LA) Deviance Residuals: Min 1Q Median 3Q Max -1.1297829 -0.2238007 0.0005743 0.2097785 1.3165738 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.485e+00 1.006e-01 44.581 < 2e-16 *** Rsp 6.005e-03 1.496e-03 4.014 5.97e-05 *** Crd 6.380e-03 5.984e-04 10.661 < 2e-16 *** Tmp -1.204e-04 9.268e-04 -0.130 0.897 Hum -7.808e-05 3.805e-04 -0.205 0.837 Crb 4.958e-04 2.538e-03 0.195 0.845 Slf -7.446e-03 5.646e-03 -1.319 0.187 Nit -4.668e-04 2.274e-03 -0.205 0.837 Hdr 7.655e-04 6.159e-04 1.243 0.214 Ozn 2.230e-03 1.840e-03 1.212 0.226 Par 1.080e-04 5.172e-04 0.209 0.835 --- Signif. codes: 0 '***. 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' (Dispersion parameter for poisson family taken to be 1) Null deviance: 588.609 on 507 degrees of freedom Residual deviance: 57.845 on 497 degrees of freedom AIC: Inf Number of Fisher Scoring iterations: 3 ##NOTE: "residuals" refer here to the "working" residuals defined as (y-hat(mu))/(d mu/d eta) where the derivative is evaluated at hat(mu). In our case of Poisson regression (d mu/d eta)=mu, since mu=exp(eta). So the working residuals are: (y-hat(mu))/hat(mu). (y-hat(mu))/hat(mu)= (y-fitted value)/(fitted value)= (PoisReg.All$y-PoisReg.All$fitted.values)/PoisReg.All$fitted.values ##Check: The 7th,8th,...,11th working residuals are: > ((PoisReg.All$y-PoisReg.All$fitted.values)/PoisReg.All$fitted.values)[7:11] 7 8 9 10 11 0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912 Now from "residuals" ###Get the same. > PoisReg.All$residual[7:11] 7 8 9 10 11 0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912 Now from resid(): ###Gives residuals when applied to glm objects. ###Get the same. > resid(PoisReg.All, type="working")[7:11] 7 8 9 10 11 0.01862911 0.04731465 0.06041701 -0.01871553 0.02238912 ##In the function resid(), "type" should be one of: "deviance", "pearson", "working", "response" In what follows we use "residuals"; that is the working residuals. At the very end we also use the response residuals for comparing models. ------------------------------------------------------- NOTE: The "Residual Deviance" is the scaled deviance. ------------------------------------------------------- NOTE: To get n! use gamma(n+1). To get log(n!) use lgamma(n+1) ------------------------------------- ##From Splus get the same plus corr matrix: > summary(PoisReg.All) Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, family = poisson, data = LA) Deviance Residuals: Min 1Q Median 3Q Max -1.129783 -0.2238007 0.0005742856 0.2097785 1.316574 Coefficients: Value Std. Error t value (Intercept) 4.484626e+00 0.1005944532 44.5812479 Rsp 6.004882e-03 0.0014960379 4.0138569 Crd 6.379706e-03 0.0005984307 10.6607267 Tmp -1.203571e-04 0.0009268404 -0.1298575 Hum -7.808055e-05 0.0003804829 -0.2052143 Crb 4.957806e-04 0.0025383581 0.1953155 Slf -7.446004e-03 0.0056458880 -1.3188366 Nit -4.667693e-04 0.0022744029 -0.2052272 Hdr 7.655378e-04 0.0006158848 1.2429887 Ozn 2.230044e-03 0.0018401519 1.2118802 Par 1.079798e-04 0.0005171768 0.2087871 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 57.84515 on 497 degrees of freedom Number of Fisher Scoring Iterations: 3 Correlation of Coefficients: (Intercept) Rsp Crd Tmp Hum Crb Rsp 0.2242698 Crd -0.6689086 -0.5688460 Tmp -0.8504103 -0.0618902 0.3291518 Hum -0.6870933 -0.1005962 0.2441310 0.5135408 Crb 0.2000477 0.0873989 -0.2207805 -0.1166508 0.0517857 Slf 0.1693699 0.2118884 -0.3702751 -0.0208349 -0.0818833 -0.1816456 Nit 0.1669253 -0.0829073 0.1707735 -0.2018831 -0.1595381 -0.1084359 Hdr -0.1772556 -0.0367965 0.0142721 0.0579830 0.0997768 -0.3700612 Ozn 0.4344092 0.0442298 -0.0850821 -0.6873258 -0.2334923 0.3819084 Par -0.2427340 -0.0413370 -0.0005778 0.2080051 0.1844139 -0.4639271 Slf Nit Hdr Ozn Rsp Crd Tmp Hum Crb Slf Nit -0.2983407 Hdr -0.0686041 -0.4477533 Ozn -0.2718690 -0.1780738 0.0398134 Par 0.1935141 -0.3772318 -0.0183395 -0.1039967 ##Plot Mortality TS (Mrt) vs Fitted TS (Estimated mu=E(Mrt)) ##FIG 3 par(mfrow=c(2,1), oma=c(0,0,5,0)) par(cex=1.1) acf(PoisReg.All$y-PoisReg.All$fitted.values) cpgram(PoisReg.All$y-PoisReg.All$fitted.values) ts.plot(cbind(PoisReg.All$y,PoisReg.All$fitted.values)) legend(200,245,c("________ Data","................ Fit"),bty="n") mtext("Observed vs Predicted", cex=1.5) ts.plot(PoisReg.All$residuals) ##(Working) residual plot somewhat sinusoidal. mtext("Residuals", cex=1.5) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.5) mtext("Rsp, Crd, Tmp, Hum, Crb, Slf, Nit, Hdr, Ozn, Par", outer=T, line=0, cex=1.5) ##Reduce the number of covariates > PoisReg.RCSHO <- glm(formula = Mrt ~ Rsp + Crd + Slf + Hdr + Ozn , family = poisson, data = LA) > summary(PoisReg.RCSHO) Call: glm(formula = Mrt ~ Rsp + Crd + Slf + Hdr + Ozn, family = poisson, data = LA) Deviance Residuals: Min 1Q Median 3Q Max -1.136971 -0.2163543 0.005178808 0.2149715 1.332251 Coefficients: Value Std. Error t value (Intercept) 4.4647362803 0.0415809850 107.374471 Rsp 0.0059250366 0.0014752054 4.016415 Crd 0.0064882213 0.0005299576 12.242907 Slf -0.0077351230 0.0051911900 -1.490048 Hdr 0.0008691312 0.0003447418 2.521108 Ozn 0.0018460830 0.0010778144 1.712802 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 58.09185 on 502 degrees of freedom Number of Fisher Scoring Iterations: 3 Correlation of Coefficients: (Intercept) Rsp Crd Slf Hdr Rsp 0.3319260 Crd -0.9050099 -0.5779281 Slf 0.3962013 0.2078572 -0.3875707 Hdr -0.1627729 -0.0874557 -0.0607864 -0.5896395 Ozn -0.6346736 -0.0657916 0.4914853 -0.5915154 0.2202745 ###Deviance Test: PoisReg.All: Deviance = 57.84515, No. parameters = 11 (include intercept) PoisReg.RCSHO: Deviance = 58.09185, No. parameters = 6 > PoisReg.RCSHO$deviance-PoisReg.All$deviance [1] 0.2467093=58.09185-57.84515 Too small. > LogReg.GPA$deviance-LogReg$deviance [1] 2.128012 ###Not large enough. ##p-value; df=5=difference in the number of parameters of the two models. > 1-pchisq(PoisReg.RCSHO$deviance-PoisReg.All$deviance, 11-6) [1] 0.998527 ###Accept "No Diffrenece" between the two models. > names(PoisReg.RCSHO) [1] "coefficients" "residuals" "fitted.values" [4] "effects" "R" "rank" [7] "qr" "family" "linear.predictors" [10] "deviance" "aic" "null.deviance" [13] "iter" "weights" "prior.weights" [16] "df.residual" "df.null" "y" [19] "converged" "boundary" "model" [22] "call" "formula" "terms" [25] "data" "offset" "control" [28] "method" "contrasts" "xlevels" ##Plot of data vs new fit. par(mfrow=c(2,1), oma=c(0,0,5,0)) par(cex=1.1) ts.plot(cbind(PoisReg.RCSHO$y,PoisReg.RCSHO$fitted.values)) legend(200,245,c("________ Data","................ Fit"),bty="n") ts.plot(PoisReg.RCSHO$residuals) ##Residual plot somewhat sinusoidal. mtext("Residuals", cex=1.2) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Rsp, Crd, Slf, Hdr, Ozn", outer=T, line=0, cex=1.2) ##FIG 4 ##Residual behavior acf(PoisReg.RCSHO$residuals) ##FIG5 Not good! cpgram(PoisReg.RCSHO$residuals) ==================================== "FINAL" MODEL: PoisReg.TCy1y2 ##Use Tmp, any pollutant, and autoregress ############### as suggested by Shumway et al. (1988). Get uncorrelated residuals! > mean(LA$Mrt) [1] 169.0482 ##Adding Mrt(t-1)=y(t-1), Mrt(t-2)=y(t-2) ##Autoregress on y(t-1),y(t-2)!!! y1 <- c(169,LA$Mrt)[-509] y2 <- c(169,169,LA$Mrt)[-c(509,510)] LA.y1y2 <- cbind(LA,y1,y2) > is.data.frame(LA.y1y2) [1] TRUE ##Naming the new variables names(LA.y1y2) <- c('Mrt','Rsp','Crd','Tmp','Hum','Crb', 'Slf','Nit','Hdr','Ozn','Par','y1','y2') > LA.y1y2[1:4,] Mrt Rsp Crd Tmp Hum Crb Slf Nit Hdr Ozn Par y1 1 183.63 11.90 97.85 72.38 29.20 11.51 3.37 9.64 45.79 6.69 72.72 169.00 2 191.05 10.75 104.64 67.19 67.51 8.92 2.59 10.05 43.90 6.83 49.60 183.63 3 180.09 9.33 94.36 62.94 61.42 9.48 3.29 7.80 32.18 4.98 55.68 191.05 4 184.67 9.54 98.05 72.49 58.99 10.28 3.04 13.39 40.43 9.25 55.16 180.09 y2 1 169.00 2 169.00 3 183.63 4 191.05 ##Use: log(Crb)!!! PoisReg.TCy1y2 <- glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 , family = poisson, data = LA.y1y2) > summary(PoisReg.TCy1y2) Call: glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, family = poisson, data = LA.y1y2) Deviance Residuals: Min 1Q Median 3Q Max -1.923433 -0.3502484 -0.04177222 0.3573013 1.893937 Coefficients: Value Std. Error t value (Intercept) 5.1751770503 2.092880e-01 24.727533 Tmp -0.0212812318 5.074996e-03 -4.193350 I(Tmp^2) 0.0001409826 3.362773e-05 4.192451 log(Crb) 0.0524105160 8.576396e-03 6.111018 y1 0.0017315880 3.707799e-04 4.670124 y2 0.0020606382 3.594891e-04 5.732131 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 166.2256 on 502 degrees of freedom (508-6=502) Number of Fisher Scoring Iterations: 3 Correlation of Coefficients: (Intercept) Tmp I(Tmp^2) log(Crb) y1 Tmp -0.9594272 I(Tmp^2) 0.9391324 -0.9962832 log(Crb) 0.0464542 0.0036636 -0.0164882 y1 -0.2175850 0.0958640 -0.0715993 -0.2170437 y2 -0.1895248 0.0884345 -0.0745675 -0.1863434 -0.5955002 #SE's From The Sample Information Matrix G_N From PL Using the Model PoisReg.TCy1y2: Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 b0 <- PoisReg.TCy1y2$coefficients[1] b1 <- PoisReg.TCy1y2$coefficients[2] b2 <- PoisReg.TCy1y2$coefficients[3] b3 <- PoisReg.TCy1y2$coefficients[4] b4 <- PoisReg.TCy1y2$coefficients[5] b5 <- PoisReg.TCy1y2$coefficients[6] Mrt <- LA$Mrt Tmp <- LA$Tmp Crb <- LA$Crb N <- length(LA$Mrt) > N [1] 508 Use our cumulative covariance matrix GN=G_{N}!!! GN <- matrix(rep(0,36),ncol=6) for(i in 3:N){ bZ <- b0+b1*Tmp[i]+b2*Tmp[i]^2 +b3*log(Crb[i])+b4*Mrt[i-1]+b5*Mrt[i-2] Z <- c(1,Tmp[i],Tmp[i]^2,log(Crb[i]),Mrt[i-1],Mrt[i-2]) ZZ <- Z%*%t(Z) GN <- GN + ZZ*exp(bZ) } InvGN <- solve(GN) > InvGN ###Covariance Matrix of PL Estimates [,1] [,2] [,3] [,4] [,5] [1,] 4.383214e-02 -1.020016e-03 6.616406e-06 8.068968e-05 -1.690211e-05 [2,] -1.020016e-03 2.578653e-05 -1.702455e-07 2.453168e-07 1.808388e-07 [3,] 6.616406e-06 -1.702455e-07 1.132379e-09 -5.361427e-09 -8.969003e-10 [4,] 8.068968e-05 2.453168e-07 -5.361427e-09 7.379477e-05 -6.898445e-07 [5,] -1.690211e-05 1.808388e-07 -8.969003e-10 -6.898445e-07 1.379105e-07 [6,] -1.419898e-05 1.595131e-07 -8.876829e-10 -5.788406e-07 -7.975442e-08 [,6] [1,] -1.419898e-05 [2,] 1.595131e-07 [3,] -8.876829e-10 [4,] -5.788406e-07 [5,] -7.975442e-08 [6,] 1.296324e-07 SE of hat(b0,b1,b2,b3,b4,b5) from PL > sqrt(c(InvGN[1,1],InvGN[2,2],InvGN[3,3],InvGN[4,4],InvGN[5,5],InvGN[6,6])) [1] 2.093613e-01 5.078044e-03 3.365085e-05 8.590388e-03 3.713630e-04 [6] 3.600450e-04 From Splus: SAME!!! 2.092880e-01 5.074996e-03 3.362773e-05 8.576396e-03 3.707799e-04 3.594891e-04 ##Get AIC below from step(). ##Some predictions: Recall in Poisson regression the link is log(mu)=eta <----Link function Therefore the predictor in Poisson regression is the estimated mu=exp(eta) <----Inverse link where eta is the linear predictor. We have: log{Hat(mu)}= Hat(eta) = b0+b1*Tmp+b2*I(Tmp^2)+b3*log(Crb)+b4*y1+b5*y2 Thus, the predictor is: Hat(E(Mrt))=Hat(mu)=exp(b0+b1*Tmp+b2*I(Tmp^2)+b3*log(Crb)+b4*y1+b5*y2) (***) > LA.y1y2$Mrt[201:210] ##Observed Mrt values [1] 177.09 183.34 184.83 173.11 164.21 164.73 170.84 178.25 170.86 179.93 > for(i in 201:210){ predictor <- exp(5.1751770503 - 0.0212812318*LA.y1y2$Tmp[i]+0.0001409826*LA.y1y2$Tmp[i]^2 +0.0524105160*log(LA.y1y2$Crb[i])+0.0017315880*LA.y1y2$y1[i] +0.0020606382*LA.y1y2$y2[i]) print(predictor)} [1] 176.1245 [1] 175.0899 [1] 185.3946 [1] 184.4768 [1] 180.7214 ##Corresponding predicted values [1] 174.5726 [1] 171.2021 [1] 169.9865 [1] 176.1829 [1] 172.444 ###Check: OK!!! Get the same prediction. > PoisReg.TCy1y2$y[201:210] ###Observed [1] 177.09 183.34 184.83 173.11 164.21 164.73 170.84 178.25 170.86 179.93 > > PoisReg.TCy1y2$fitted.values[201:210] ###Predicted 201 202 203 204 205 206 207 208 176.1245 175.0899 185.3946 184.4768 180.7214 174.5726 171.2021 169.9865 209 210 176.1829 172.444 ##Plot of data vs new fit. ##FIG 5 tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values) legend(350,228,c("________ Data","................ Fit"),bty="n") tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good. mtext("Residuals", cex=1.2) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", outer=T, line=0, cex=1.2) #Partial Plot: ##Fig 5' > tsplot(PoisReg.TCy1y2$y[101:200],PoisReg.TCy1y2$fitted.values[101:200]) mtext("t in 101:200", cex=1.5) > tsplot(PoisReg.TCy1y2$y[201:300],PoisReg.TCy1y2$fitted.values[201:300]) mtext("t in 201:300", cex=1.5) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", outer=T, line=0, cex=1.2) ###Comparison of (working) residuals ##FIG 6 par(mfrow=c(2,1), oma=c(0,0,5,0)) acf(PoisReg.All$residuals) ###Working residuals mtext("Residual ACF: Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par", line=1, cex=1.2) acf(PoisReg.TCy1y2$residuals) mtext("Residual ACF: Mrt ~ Tmp + I(Tmp^2) + log(Crb) + Mrt(t-1) + Mrt(t-2)", line=1, cex=1.2) mtext("ACF of Working Residuals From Two Poisson Regressions", outer=T, line=2, cex=1.5) ###Get Pearson Residuals PearsonResd <- resid(PoisReg.TCy1y2, type="pearson") > sum(PearsonResd^2)/(508-6) [1] 0.3314884 <--- phi ##Residual behavior of "Final Model": Excellent! acf(PoisReg.TCy1y2$residuals) ###Working residuals cpgram(PoisReg.TCy1y2$residuals) ###Working residuals ============================= ##3 concluding plots par(mfrow=c(3,1), oma=c(0,0,5,0)) ##For R ts.plot(ts(PoisReg.TCy1y2$y),ts(PoisReg.TCy1y2$fitted.values)) ##For Splus tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values) legend(350,228,c("________ Data","................ Fit"),bty="n") mtext("Weekly Mortality vs Poisson Fit", cex=1.2) tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good. mtext("Residuals", cex=1.2) acf(PoisReg.TCy1y2$residuals) mtext("Residual ACF", line=0, cex=1) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", outer=T, line=0, cex=1.2) ##FIG 7 (30.5-300*0.079)/sqrt(300*0.079*0.921) ====================== ###Poisson AR: No other covariates log(mu)= b0+b1*Mrt(t-1)+b2*Mrt(t-2) = b0+b1*y(t-1)+b2*y(t-2) > P.AR2 <- glm(formula = Mrt ~ y1 + y2, poisson, data = LA.y1y2) > summary(P.AR2) Call: glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2) Deviance Residuals: Min 1Q Median 3Q Max -2.150422 -0.4200039 -0.04843802 0.4570537 2.039434 Coefficients: Value Std. Error t value (Intercept) 4.311907719 0.0424379602 101.604971 y1 0.002273389 0.0003476951 6.538457 y2 0.002554930 0.0003477661 7.346691 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 222.2333 on 505 degrees of freedom Number of Fisher Scoring Iterations: 3 Correlation of Coefficients: (Intercept) y1 y1 -0.3564393 y2 -0.3571710 -0.7437231 ##Comparison between glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 , family = poisson, data = LA.y1y2) and glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2) NOTE: Sharp increase of deviance PoisReg.TCy1y2 gives 166.2256 on 502 <--Include Tmp,Tmp^2,Crb,y(t-1),y(t-2) P.AR2 gives 222.2333 on 505 <--Include only y(t-1),y(t-2) Note: df=N-p=508-(5 coeff. + 1 intercept)=502 for PoisReg.TCy1y2 df=N-p=508-(2 coeff. + 1 intercept)=505 for P.AR2 MSE.P.AR2 <- sum(P.AR2$y-P.AR2$fitted.values)^2/508 == 3.868998e-18 !!! MSE.PoisReg.TCy1y2 <- sum(PoisReg.TCy1y2$y-PoisReg.TCy1y2$fitted.values)^2/508 ==3.785096e-19 ##<---Smaller!!! > P.AR2$residuals%*%P.AR2$residuals ##SS Working Residuals [,1] [1,] 1.294978 > PoisReg.TCy1y2$residuals%*%PoisReg.TCy1y2$residuals ##SS Working Residuals [,1] [1,] 0.9717937 ##Smaller!!! par(mfrow=c(4,1), oma=c(0,0,5,0)) tsplot(P.AR2$y,P.AR2$fitted.values) tsplot(PoisReg.TCy1y2$y,PoisReg.TCy1y2$fitted.values) tsplot(P.AR2$y[301:500],P.AR2$fitted.values[301:500]) tsplot(PoisReg.TCy1y2$y[301:500],PoisReg.TCy1y2$fitted.values[301:500]) mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Mrt(t-1), Mrt(t-2) vs Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", outer=T, line=0, cex=1.2) ###FIG 8 tsplot(P.AR2$residuals) ##(Working) residuals look good. tsplot(PoisReg.TCy1y2$residuals) ##(Working) residuals look good. acf(P.AR2$residuals) acf(PoisReg.TCy1y2$residuals) ##Almost the same!!! Resemble WN!!! mtext("Poisson Regression of LA Mortality Count on", outer=T,line=2, cex=1.2) mtext("Mrt(t-1), Mrt(t-2) vs Tmp, Tmp^2, log(Crb), Crd, Mrt(t-1), Mrt(t-2)", outer=T, line=0, cex=1.2) ##FIG 9 ##Compare PoisReg.TCy1y2 and P.AR2 PoisReg.TCy1y2 <- glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2 , family = poisson, data = LA.y1y2) Residual Deviance: 166.2256 on 502 degrees of freedom P.AR2 <- glm(formula = Mrt ~ y1 + y2, poisson, data = LA.y1y2) Residual Deviance: 222.2333 on 505 degrees of freedom > P.AR2$deviance [1] 222.233 > P.AR2$df [1] 505 > PoisReg.TCy1y2$deviance [1] 166.2256 > PoisReg.TCy1y2$df [1] 502 Test significance of Tmp, Tmp^2),log(Crb): p-value; df=3=difference in the number of parameters of the two models. > 1-pchisq(P.AR2$deviance-PoisReg.TCy1y2$deviance, 505-502) [1] 4.184986e-12 ###"Reject": Tmp, Tmp^2),log(Crb) are significant!!! ---------------------------------- ###Now do stepwise model selection using step(): DESCRIPTION: Performs stepwise model selection. This function is generic (see Methods); method functions can be written to handle specific classes of data. Classes which already have methods for this function include: gam, glm. Example: glm.object <- glm(Kyphosis ~ Age + Start + Number, family = binomial, data = kyphosis) step(glm.object) Use everything except Crd and Rsp: LM <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Slf + Nit + Hdr + Ozn + Par , family = poisson, data = LA.y1y2) ResidLM <- resid(LM,type="pearson") ResidLM%*%ResidLM/508 = 0.3117516 step(glm(formula = Mrt ~ y1 + y2 + Tmp + Crb + Hum + Ozn , family = poisson, data = LA.y1y2)) Get AIC from repeated use of step() applied to the respective model. The results below are from S-Plus!!! R gives AIC=Inf!!! AIC= 182.0731 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Slf + Nit + Hdr + Ozn + Par AIC= 180.159 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Nit + Hdr + Ozn + Par AIC= 178.4193 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Nit + Hdr + Ozn AIC= 176.8529 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Nit + Hdr Step: AIC= 175.0992 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum + Hdr ---------------------------------------- AIC= 173.8994 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum ##Step() stopped here Residual Deviance: 159.8994 on 501 degrees of freedom glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) + Hum, family = poisson, data = LA.y1y2) ---------- #######"Final" Model PoisReg.TCy1y2: Decided not to include humidity #######Test for humidity by: D0-D1 ~ ChiSq(p-q) step(glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), family = poisson, data = LA.y1y2)) AIC= 178.2256 ##This is from S-Plus. R gives AIC=Inf. Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) Residual (scaled) Deviance: 166.2256 on 502 degrees of freedom > 1-pchisq(166.2256-159.8994, 502-501) [1] 0.0118967 -----> Hum Significant. -------------------- step(glm(formula = Mrt ~ y1 + y2, family = poisson, data = LA.y1y2)) AIC= 228.2333 Mrt ~ y1 + y2 Residual Deviance: 222.2333 on 505 degrees of freedom > 1-pchisq(222.2333-159.8994, 505-501) [1] 9.370282e-13 ----> (Tmp + I(Tmp^2) + log(Crb) + Hum) Significant. --------------------------- Just for comparison with a linear model (Gaussian canonical link; better to use lm than glm; see Venables and Ripley section on the default Gaussian link): Note: Put y1, y2, first; before they were last. LM <- lm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), data = LA.y1y2) > summary(LM) Call: lm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), data = LA.y1y2) Residuals: Min 1Q Median 3Q Max -24.63 -4.684 -0.4163 4.723 28.68 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 173.9072 21.4519 8.1069 0.0000 y1 0.3116 0.0371 8.4016 0.0000 y2 0.3635 0.0360 10.1039 0.0000 Tmp -3.6616 0.5183 -7.0643 0.0000 I(Tmp^2) 0.0243 0.0034 7.0864 0.0000 log(Crb) 8.6327 0.8519 10.1334 0.0000 Residual standard error: 7.622 on 502 degrees of freedom Multiple R-Squared: 0.714 F-statistic: 250.7 on 5 and 502 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) y1 y2 Tmp I(Tmp^2) y1 -0.2259 y2 -0.1966 -0.5740 Tmp -0.9599 0.1021 0.0927 I(Tmp^2) 0.9402 -0.0781 -0.0786 -0.9964 log(Crb) 0.0429 -0.2161 -0.1945 0.0093 -0.0217 > step(LM) Start: AIC= 29864.27 ###Much larger than AIC in Poisson regression. Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) Single term deletions Model: Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) scale: 58.10169 Df Sum of Sq RSS Cp 29167.05 29864.27 y1 1 4101.226 33268.27 33849.29 y2 1 5931.579 35098.63 35679.64 Tmp 1 2899.490 32066.54 32647.56 I(Tmp^2) 1 2917.728 32084.78 32665.79 log(Crb) 1 5966.218 35133.27 35714.28 Call: lm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), data = LA.y1y2) Coefficients: (Intercept) y1 y2 Tmp I(Tmp^2) log(Crb) 173.9072 0.3115606 0.3634993 -3.661645 0.02430015 8.632702 Degrees of freedom: 508 total; 502 residual Residual standard error: 7.622446 > ResidLM <- resid(LM,type="pearson") > ResidLM%*%ResidLM/508 [,1] [1,] 57.41545 Now compare with the Final model using Poisson reg. Final <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), family = poisson, data = LA.y1y2) > summary(Final) Call: glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), family = poisson, data = LA.y1y2) Deviance Residuals: Min 1Q Median 3Q Max -1.923433 -0.3502484 -0.04177222 0.3573013 1.893937 Coefficients: Value Std. Error t value (Intercept) 5.1751770503 2.092880e-01 24.727533 y1 0.0017315880 3.707799e-04 4.670124 y2 0.0020606382 3.594891e-04 5.732131 Tmp -0.0212812318 5.074996e-03 -4.193350 I(Tmp^2) 0.0001409826 3.362773e-05 4.192451 log(Crb) 0.0524105160 8.576396e-03 6.111018 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 166.2256 on 502 degrees of freedom Number of Fisher Scoring Iterations: 3 Correlation of Coefficients: (Intercept) y1 y2 Tmp I(Tmp^2) y1 -0.2175850 y2 -0.1895248 -0.5955002 Tmp -0.9594272 0.0958640 0.0884345 I(Tmp^2) 0.9391324 -0.0715993 -0.0745675 -0.9962832 log(Crb) 0.0464542 -0.2170437 -0.1863434 0.0036636 -0.0164882 ResidFinal <- resid(Final,type="pearson") > ResidFinal%*%ResidFinal/508 [,1] [1,] 0.3275731 #Much smaller than in LM!!! ---------------------------------------------- y1 <- c(169,LA$Mrt)[-509] y2 <- c(169,169,LA$Mrt)[-c(509,510)] LA.y1y2 <- cbind(LA,y1,y2) Adding Tmp[t-1], Tmp[t-2]: T1 <- c(mean(LA.y1y2$Tmp),LA.y1y2$Tmp)[-509] T2 <- c(mean(LA.y1y2$Tmp),mean(LA.y1y2$Tmp),LA.y1y2$Tmp)[-c(509,510)] > LA1 <- cbind(LA.y1y2,T1,T2) > is.data.frame(LA1) [1] T names(LA1) <- c('Mrt','Rsp','Crd','Tmp','Hum','Crb', 'Slf','Nit','Hdr','Ozn','Par','y1','y2','T1','T2') > cor(LA1) Mrt Rsp Crd Tmp Hum Crb Mrt 0.9999999 0.726959705 0.9236739 -0.36018065 -0.24601072 0.55707395 Rsp 0.7269597 1.000000000 0.6137877 -0.33055755 -0.07576124 0.29582089 Crd 0.9236739 0.613787711 1.0000000 -0.43863967 -0.23572391 0.55744761 Tmp -0.3601806 -0.330557555 -0.4386397 1.00000000 -0.29328853 -0.09785584 Hum -0.2460107 -0.075761244 -0.2357239 -0.29328853 1.00000000 -0.43785578 Crb 0.5570740 0.295820892 0.5574476 -0.09785584 -0.43785578 1.00000012 Slf 0.2308592 0.004090783 0.2569989 0.40437400 -0.27213290 0.51300466 Nit 0.2200107 0.061664522 0.1674097 0.42985296 -0.37408376 0.68795371 Hdr 0.4407115 0.215747550 0.4011386 0.11413717 -0.42981786 0.85104489 Ozn -0.3518491 -0.324815482 -0.4260242 0.85137832 -0.04935255 -0.25106162 Par 0.4741024 0.264985323 0.4438713 -0.01723096 -0.43278363 0.86611748 y1 0.7273812 0.670334339 0.7191584 -0.48296806 -0.02550465 0.41032946 y2 0.7403291 0.638668060 0.7191422 -0.43919477 -0.06520956 0.39624998 T1 -0.5124716 -0.390850842 -0.5447309 0.60201085 0.14257042 -0.36769858 T2 -0.4361836 -0.380785674 -0.4911761 0.61820769 0.03660665 -0.23522018 Slf Nit Hdr Ozn Par y1 Mrt 0.230859205 0.22001068 0.440711468 -0.351849109 0.47410244 0.72738117 Rsp 0.004090783 0.06166452 0.215747550 -0.324815482 0.26498532 0.67033434 Crd 0.256998926 0.16740966 0.401138604 -0.426024169 0.44387129 0.71915835 Tmp 0.404374003 0.42985296 0.114137165 0.851378322 -0.01723096 -0.48296806 Hum -0.272132903 -0.37408376 -0.429817855 -0.049352545 -0.43278363 -0.02550465 Crb 0.513004661 0.68795371 0.851044893 -0.251061618 0.86611748 0.41032946 Slf 1.000000000 0.73002124 0.612865746 0.405081540 0.46793401 0.12870277 Nit 0.730021238 1.00000000 0.822394788 0.346277297 0.73722708 0.07150596 Hdr 0.612865746 0.82239479 1.000000119 -0.007746873 0.80750394 0.26645952 Ozn 0.405081540 0.34627730 -0.007746873 1.000000000 -0.12284482 -0.42385262 Par 0.467934012 0.73722708 0.807503939 -0.122844823 0.99999994 0.33054382 y1 0.128702775 0.07150596 0.266459525 -0.423852623 0.33054382 1.00000012 y2 0.130368695 0.06948008 0.270989090 -0.401137859 0.28707892 0.72741520 T1 0.100897610 0.03276984 -0.229703426 0.581990957 -0.25979626 -0.36011869 T2 0.116964988 0.08248030 -0.129468620 0.560343921 -0.15373182 -0.51245707 y2 T1 T2 Mrt 0.74032909 -0.51247156 -0.43618360 Rsp 0.63866806 -0.39085084 -0.38078567 Crd 0.71914220 -0.54473090 -0.49117613 Tmp -0.43919477 0.60201085 0.61820769 Hum -0.06520956 0.14257042 0.03660665 Crb 0.39624998 -0.36769858 -0.23522018 Slf 0.13036869 0.10089761 0.11696499 Nit 0.06948008 0.03276984 0.08248030 Hdr 0.27098909 -0.22970343 -0.12946862 Ozn -0.40113786 0.58199096 0.56034392 Par 0.28707892 -0.25979626 -0.15373182 y1 0.72741520 -0.36011869 -0.51245707 y2 1.00000012 -0.48308662 -0.36013213 T1 -0.48308662 1.00000000 0.60203445 T2 -0.36013213 0.60203445 1.00000000 > cor(cbind(log(LA1$Mrt),LA1$Tmp,LA1$T1,LA1$T2,LA1$y1,LA1$y2,log(LA1$Crb), LA1$Hum)) cor(log(LA1$Mrt),LA1$Tmp)= -0.35902 cor(log(LA1$Mrt),LA1$T1)= -0.5164694 cor(log(LA1$Mrt),LA1$T2)= -0.4395802 cor(log(LA1$Mrt),LA1$y1)= 0.7170978 cor(log(LA1$Mrt),LA1$y2)= 0.7361614 cor(log(LA1$Mrt),LA1$Crb)= 0.5615486 cor(log(LA1$Mrt),log(LA1$Crb))= 0.582759 cor(log(LA1$Mrt),LA1$Hum)= -0.2548507 y1y2T1T2TmpLogCHum <- glm(formula = Mrt ~ y1 + y2 + T1 + T2 + Tmp + log(Crb) + Hum, family = poisson, data = LA1) Start: AIC= 183.9295 Mrt ~ y1 + y2 + T1 + T2 + Tmp + log(Crb) + Hum Step: AIC= 182.1486 Mrt ~ y1 + y2 + T1 + Tmp + log(Crb) + Hum Step: AIC= 180.8557 Mrt ~ y1 + y2 + T1 + log(Crb) + Hum ------------------- y1y2T1LogCHum <- glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb) + Hum, family = poisson, data = LA1) Start: AIC= 180.8557 Mrt ~ y1 + y2 + T1 + log(Crb) + Hum Residual Deviance: 168.8557 on 502 degrees of freedom > y1y2T1LogCHum$residuals%*%y1y2T1LogCHum$residuals [,1] [1,] 0.9856711 -------------------- Comparison of 4 models of Poisson Regression with canonical link ================================================================== USE THIS MODEL Model 4: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) + Tmp(t-1) + log(Crb(t)) ------- y1y2T1LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), family = poisson, data = LA1) Start: AIC= 184.5522 Mrt ~ y1 + y2 + T1 + log(Crb) Residual Deviance: 174.5522 on 503 degrees of freedom > summary(y1y2T1LogC) Call: glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), family = poisson, data = LA1) Deviance Residuals: Min 1Q Median 3Q Max -1.918812 -0.3562656 -0.03168986 0.360087 2.100885 Coefficients: Value Std. Error t value (Intercept) 4.505127648 0.0694395719 64.878390 y1 0.001888282 0.0003544952 5.326679 y2 0.001837161 0.0003717740 4.941607 T1 -0.001334397 0.0004426010 -3.014898 log(Crb) 0.046829757 0.0086943161 5.386250 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 174.5522 on 503 degrees of freedom > y1y2T1LogC$residuals%*%y1y2T1LogC$residuals [,1] [1,] 1.020108 Is humidity significant ? 174.5522 on 503 Model 4 168.8557 on 502 Model 4 + Hum > 1-pchisq(174.5522 - 168.8557, 503-502) [1] 0.01699878 ##p-value Humidity is probably significant, but its coeficient is very small, so I am including it. tsplot(y1y2T1LogC$y,tsplot(y1y2T1LogC$fitted.values)) tsplot(y1y2T1LogC$residuals) > mean(y1y2T1LogC$residuals) [1] 7.929388e-06 acf(y1y2T1LogC$residuals) Resid4 <- y1y2T1LogC$residuals ##Working residuals. > Resid4%*%Resid4 [,1] [1,] 1.020108 Another way: WR <- sum(residuals(y1y2T1LogC, type="w")^{2}) > WR [1] 1.020108 Response residuals = y-hat(y) RespResid.y1y2T1LogC <- resid(y1y2T1LogC, type="response") ##Response resid. > RespResid.y1y2T1LogC%*%RespResid.y1y2T1LogC/508 ###MSE [,1] [1,] 59.38317 Another way: MSE <- sum(residuals(y1y2T1LogC, type="r")^{2})/508 > MSE [1] 59.38317 Another way: mean(residuals(y1y2T1LogC, type="r")^{2}) Use G_N to get SE's for Model 4 -------------------------------- Mrt ~ y1 + y2 + T1 + log(Crb) b0 <- y1y2T1LogC$coefficients[1] ##Intercept b1 <- y1y2T1LogC$coefficients[2] b2 <- y1y2T1LogC$coefficients[3] b3 <- y1y2T1LogC$coefficients[4] b4 <- y1y2T1LogC$coefficients[5] Mrt <- LA1$Mrt Tmp <- LA1$Tmp Crb <- LA1$Crb N <- length(LA1$Mrt) > N [1] 508 Use our cumulative covariance matrix GN=G_{N}!!! GN <- matrix(rep(0,25),ncol=5) for(i in 3:N){ bZ <- b0 + b1*Mrt[i-1]+b2*Mrt[i-2] + b3*Tmp[i-1] + b4*log(Crb[i]) Z <- c(1,Mrt[i-1],Mrt[i-2],Tmp[i-1],log(Crb[i])) ZZ <- Z%*%t(Z) GN <- GN + ZZ*exp(bZ) } > InvGN <- solve(GN) #Approx. Cov Matrix of MPLE hat{beta}. > InvGN [,1] [,2] [,3] [,4] [,5] [1,] 4.821996e-03 -5.137528e-06 -1.186505e-05 -2.419959e-05 -6.718087e-05 [2,] -5.137528e-06 1.261890e-07 -8.756092e-08 -3.640349e-09 -5.800100e-07 [3,] -1.186505e-05 -8.756092e-08 1.385955e-07 5.080947e-08 -2.846573e-07 [4,] -2.419959e-05 -3.640349e-09 5.080947e-08 1.959349e-07 8.601128e-07 [5,] -6.718087e-05 -5.800100e-07 -2.846573e-07 8.601128e-07 7.581801e-05 SE of hat(b0,b1,b2,b3,b4,b) from PL > sqrt(c(InvGN[1,1],InvGN[2,2],InvGN[3,3],InvGN[4,4],InvGN[5,5])) [1] 0.0694405902 0.0003552309 0.0003722841 0.0004426454 0.0087073538 From S-Plus: Essentially the same!!! 0.0694395719,0.0003544952,0.0003717740,0.0004426010,0.0086943161 tsplot(y1y2T1LogC$residuals) ###Same as tsplot(A[,'Mrt']) mtext("y1y2T1LogC$residuals", line=0,cex=1.2) PR.Mrt <- spectrum(y1y2T1LogC$residuals) ##Raw Periodogram SP.Mrt <- spectrum(y1y2T1LogC$residuals, spans=c(3,5,7))#Smoothed Period. spectrum(rnorm(508), spans=c(3,5,7))#For comparison par(mfrow=c(3,2), oma=c(0,0,5,0)) Mort <- LA1$Mrt Temp <- LA1$Tmp CO <- LA1$Crb tsplot(Mort) mtext("Mortality", line=0,cex=1.2) acf(Mort) tsplot(Temp) mtext("Temperature", line=0,cex=1.2) acf(Temp) tsplot(log(CO)) mtext("log(CO)", line=0,cex=1.2) acf(log(CO)) mtext("Mortality, Temperature, log(CO)", outer=T,line=1, cex=1.5) Check fitted values of Model 4: -------------------------------- Mrt ~ y1 + y2 + T1 + log(Crb) b0 <- y1y2T1LogC$coefficients[1] ##Intercept b1 <- y1y2T1LogC$coefficients[2] b2 <- y1y2T1LogC$coefficients[3] b3 <- y1y2T1LogC$coefficients[4] b4 <- y1y2T1LogC$coefficients[5] Mrt <- LA1$Mrt Tmp <- LA1$Tmp Crb <- LA1$Crb > N <- length(Mrt) > N [1] 508 mu <- rep(0,N) ##Really hat(mu) for(i in 3:N){ mu[i] <- exp(b0 + b1*Mrt[i-1]+b2*Mrt[i-2] + b3*Tmp[i-1] + b4*log(Crb[i])) } > y1y2T1LogC$fitted.values[101:105] 101 102 103 104 105 198.2466 203.913 196.3034 194.3792 193.7002 > mu[101:105] [1] 198.2466 203.9130 196.3034 194.3792 193.7002 ##SAME. > y1y2T1LogC$fitted.values[504:508] 504 505 506 507 508 153.4763 158.2064 162.6193 163.1669 168.3671 > mu[504:508] [1] 153.4763 158.2064 162.6193 163.1669 168.3671 ##SAME. Check acf,cpgram: Inside the WN bounds ----------------------------------------- Best model y1y2T1LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + log(Crb), family = poisson, data = LA1) acf(y1y2T1LogC$residuals) ###Working residuals cpgram(y1y2T1LogC$residuals) ###Working residuals ts.plot(y1y2T1LogC$residuals) qqnorm(y1y2T1LogC$residuals) hist(y1y2T1LogC$residuals) Compare to PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=poisson, data=LA) acf(PoisReg.All$residuals) ###Working residuals cpgram(PoisReg.All$residuals) ###Working residuals ts.plot(PoisReg.All$residuals) qqnorm(PoisReg.All$residuals) hist(PoisReg.All$residuals) Prediction interval for mu_t: ------------------------------- Since Mrt is assumed Poisson, and sinec the mean of Mrt is high, use the normal approx: y = +/- mu + 1.96*sqrt(mu) U <- rep(0,N) for(i in 3:N){ U [i] <- mu[i] + 1.96*sqrt(mu[i]) } L <- rep(0,N) for(i in 3:N){ L [i] <- mu[i] - 1.96*sqrt(mu[i]) } par(oma=c(0,0,5,0)) tsplot(Mrt[3:508],L[3:508],U[3:508]) mtext("LA Mortality and Prediction Bounds", outer=T,line=1, cex=1.5) Logistic Regression: Using the covariates of Model 4. -------------------- > x <- (LA1$Mrt > 180) LR <- glm(formula = x ~ y1 + y2 + T1 + log(Crb), family = binomial, data = LA1) > mean(x) [1] 0.1889764 > length(x) [1] 508 > summary(LR) Call: glm(formula = x ~ y1 + y2 + T1 + log(Crb), family = binomial, data = LA1) Deviance Residuals: Min 1Q Median 3Q Max -2.398971 -0.3946087 -0.1765495 -0.06889809 2.96192 Coefficients: Value Std. Error t value (Intercept) -28.78559510 4.25454348 -6.765848 y1 0.07231370 0.01801253 4.014634 y2 0.07748155 0.01873651 4.135325 T1 -0.03691172 0.02149358 -1.717337 log(Crb) 1.82124808 0.43960315 4.142937 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 492.4911 on 507 degrees of freedom Residual Deviance: 249.1258 on 503 degrees of freedom Number of Fisher Scoring Iterations: 6 > LR$y[50:70] [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 > LR$fitted.values[50:70] 50 51 52 53 54 55 56 57 0.9926582 0.8860716 0.9495808 0.8664859 0.9023523 0.8528324 0.7852525 0.84669 58 59 60 61 62 63 64 0.690341 0.8888058 0.4264606 0.2323411 0.07574699 0.0504389 0.1027491 65 66 67 68 69 70 0.07480436 0.04605685 0.02240811 0.03680879 0.09486271 0.1855351 > LR$fitted.values[50:70]*1 50 51 52 53 54 55 56 57 0.9926582 0.8860716 0.9495808 0.8664859 0.9023523 0.8528324 0.7852525 0.84669 58 59 60 61 62 63 64 0.690341 0.8888058 0.4264606 0.2323411 0.07574699 0.0504389 0.1027491 65 66 67 68 69 70 0.07480436 0.04605685 0.02240811 0.03680879 0.09486271 0.1855351 (LR$y-LR$fitted.values)%*%(LR$y-LR$fitted.values)/508=0.07416702 tsplot(LR$y[95:200],LR$fitted.values[95:200]) ------------------------------------------ Model 5: ---------- y1y2T1T2LogC <- glm(formula = Mrt ~ y1 + y2 + T1 + T2 +log(Crb), family = poisson, data = LA1) Residual Deviance: 174.5268 on 502 degrees of freedom Start: AIC= 186.5268 Mrt ~ y1 + y2 + T1 + T2 + log(Crb) Is T2 significant ? 174.5268 on 502 Model 5 (y1 + y2 + T1 + T2 +log(Crb)) 174.5522 on 503 Model 4 (y1 + y2 + T1 + log(Crb)) > 1-pchisq(174.5522 - 174.5268 , 503-502) [1] 0.8733744 ##p-value So, T2 NOT significant!!! > Resid5 <- y1y2T1T2LogC$residuals > Resid5%*%Resid5 [,1] [1,] 1.01988 > RespResid.y1y2T1T2LogC <- resid(y1y2T1T2LogC, type="response") > RespResid.y1y2T1T2LogC%*%RespResid.y1y2T1T2LogC/508 [,1] [1,] 59.38478 ------------------------------------------- Model 6: Mrt ~ y1 + y2 + T1 + Tmp + log(Crb) y1y2T1TmpLogC <- glm(formula = Mrt ~ y1 + y2 + T1 +Tmp+log(Crb), family = poisson, data = LA1) Residual Deviance: 171.4078 on 502 degrees of freedom Start: AIC= 183.4078 Mrt ~ y1 + y2 + T1 + Tmp + log(Crb) Is Tmp significant ? Model 5: Mrt ~ y1 + y2 + T1 + Tmp + log(Crb), 171.4078 on 502 df Model 4: Mrt ~ y1 + y2 + T1 + log(Crb), 174.5522 on 503 df > 1-pchisq(174.5522 - 171.4078 , 503-502) [1] 0.07618802 #p-value Tmp not quite significant > y1y2T1TmpLogC$residuals%*%y1y2T1TmpLogC$residuals [,1] [1,] 0.9992826 > resid(y1y2T1TmpLogC,type="response") %*%resid(y1y2T1TmpLogC,type="response")/508 [,1] [1,] 58.44298 ---------------------------------- Model 3: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) + Tmp(t-1) ------- y1y2T1 <- glm(formula = Mrt ~ y1 + y2 + T1, family = poisson, data = LA1) Residual Deviance: 203.5214 on 504 degrees of freedom > y1y2T1$residuals%*%y1y2T1$residuals [,1] [1,] 1.190659 Start: AIC= 211.5214 Mrt ~ y1 + y2 + T1 > mean(y1y2T1$residuals) [1] 2.819284e-06 Is log(Crb) significant ? Model 3: 203.5214 on 504 Model 4: 174.5522 on 503 > 1-pchisq(203.5214 - 174.5522,504-503) [1] 7.353829e-08 log(Carbon) is very significant!!! RespResid.y1y2T1 <- resid(y1y2T1, type="response") > RespResid.y1y2T1%*%RespResid.y1y2T1/508 [,1] [1,] 69.04306 Resid3 <- y1y2T1$residuals > Resid3%*%Resid3 [,1] [1,] 1.190659 -------------------------------- Model 2: Mrt(t) ~ Mrt(t-1) + Mrt(t-2) ------- y1y2 <- glm(formula = Mrt ~ y1 + y2 , family = poisson, data = LA1) Start: AIC= 228.2333 Mrt ~ y1 + y2 > y1y2$residuals%*%y1y2$residuals [,1] [1,] 1.294978 > mean(y1y2$residuals) [1] -5.101422e-06 Response Residuals= y-hat(y) RespResid.y1y2 <- resid(y1y2, type="response") > RespResid.y1y2%*%RespResid.y1y2/508 [,1] [1,] 75.52384 Residual Deviance: 222.2333 on 505 degrees of freedom 174.5522 on 503 Model 4: Mrt ~ y1 + y2 + T1 + log(Crb) 222.2333 on 505 Model 2: Mrt ~ y1 + y2 Tmp(t-1)=T1,log(Crb)=log(Carbon) significant ? > 1-pchisq(222.2333 - 174.5522,505-503) [1] 4.427725e-11 ###Yes. T1, log(Crb) are significant!!! acf(y1y2$residuals) Resid2 <- y1y2$residuals > Resid2%*%Resid2 [,1] [1,] 1.294978 --------------------------------------------- Model 1: Mrt(t) ~ Mrt(t-1) y1 <- glm(formula = Mrt ~ y1 , family = poisson, data = LA1) Residual Deviance: 276.071 on 506 degrees of freedom > y1$residuals%*%y1$residuals [,1] [1,] 1.620043 RespResid.y1 <- resid(y1, type="response") > RespResid.y1%*%RespResid.y1/508 [,1] [1,] 93.01784 AIC= 280.071 Mrt ~ y1 Start: AIC= 280.071 Mrt ~ y1 ---------------------------------------------- Model 0: Mrt(t) ~ Tmp(t) + Hum(t) + Crb(t) + Slf(t) + Nit(t) + Hdr(t) + Ozn(t) + Par(t) WeathPol <- glm(formula = Mrt ~ Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=poisson, data=LA) Residual Deviance: 315.6854 on 499 degrees of freedom Start: AIC= 333.6854 Mrt ~ Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par acf(WeathPol$residuals) Resid0 <- WeathPol$residuals > Resid0%*%Resid0 ##SS of working residuals [,1] [1,] 1.857614 > WeathPol$residuals%*%WeathPol$residuals [,1] [1,] 1.857614 RespResid.WeathPol <- resid(WeathPol, type="response") > RespResid.WeathPol%*%RespResid.WeathPol/508 [,1] [1,] 108.9936 Summary: Poisson Regression ----------------------------- Note: When the scale parameter is phi=1, then S-Plus gives UP TP AN ADDITIVE CONSTANT (Venables and Ripley p. 187) AIC = Deviance + 2*(Number of estimated parameters) For example: In Model 1 we estimate 8+1=9 parameters, and deviance=315.6854. Thus AIC=315.6854+2*9=333.6854 NumPar SS.WorkResid SS.RespRes/508 Deviance df AIC ------------------------------------------------------------- Model 0 9 1.857614 108.9936 315.6854 499 333.6854 Model 1 2 1.620043 93.0178 276.0710 506 280.071 Model 2 3 1.294978 75.5238 222.2333 505 228.2333 Model 3 4 1.190659 69.0431 203.5214 504 211.5214 Model 4 5 1.020108 59.3832 174.5522 503 184.5522 Model 5 6 1.019880 59.3848 174.5268 502 186.5268 Model 6 6 0.999283 58.4430 171.4078 502 183.4078 Try(below) 6 0.988 57.87 169.6683 502 181.67 Model Pearson={\cal {\chi}}^{2} BIC ---------------------------------------------- 0 320.17 371.76 1 276.11 288.53 2 222.32 240.92 3 203.78 228.44 4 174.91 205.71 5 174.89 211.91 6 171.72 208.79 Try(below) 169.95 207.05 From BIC, Model 4 is the winner. Model 0: Mrt ~ Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par Model 1: Mrt ~ y1 Model 2: Mrt ~ y1 + y2 Model 3: Mrt ~ y1 + y2 + T1 Model 4: Mrt ~ y1 + y2 + T1 + log(Crb) Model 5: Mrt ~ y1 + y2 + T1 + T2 + log(Crb) Model 6: Mrt ~ y1 + y2 + T1 + Tmp+ log(Crb) DIAGNOSTICS FUNCTION For a glm object x: out.f <- function(x) { p <- length(x$coef) wr <- sum(residuals(x, type="w")^{2}) mse <- mean(residuals(x, type="r")^{2}) x2 <- sum(residuals(x, type="p")^{2}) d <- x$deviance df <- x$df aic <- d+2*length(x$coeff) bic <- d+length(x$coef)*log(length(x$resid)) c(p, wr, mse, x2, d, df, aic,bic) } Plots ------- par(mfrow=c(2,2), oma=c(0,0,5,0)) tsplot(y1y2T1LogC$y,y1y2T1LogC$fitted.values) legend(350,228,c("____ Data","........ Fit"),bty="n") mtext("Poisson Regression of Filtered LA Mortality Mrt(t) on", outer=T,line=2, cex=1.2) mtext("Mrt(t-1), Mrt(t-2), Tmp(t-1), log(Crb(t))", outer=T, line=0, cex=1.2) tsplot(Resid0,main="Model 0 Residuals") ##(Working) residuals for Model 1. tsplot(Resid4,main="Model 4 Residuals") ##(Working) residuals for Model 4. acf(Resid0) acf(Resid4) mtext("Comparison of Residuals", outer=T,line=2, cex=1.5) ----------------------------------- Add Tmp^2 as suggested by Shumway et al. Try <- glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2 )+ log(Crb), family = poisson, data = LA1) > summary(Try) Call: glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb), family = poisson, data = LA1) Deviance Residuals: Min 1Q Median 3Q Max -1.821623 -0.3464865 -0.01389327 0.3605499 2.116257 Coefficients: Value Std. Error t value (Intercept) 4.477866e+00 0.0705140873 63.503144 y1 2.172803e-03 0.0003769724 5.763824 y2 1.816273e-03 0.0003718341 4.884632 T1 -1.987967e-03 0.0005323697 -3.734185 I(Tmp^2) 7.757581e-06 0.0000035087 2.210956 log(Crb) 4.062347e-02 0.0091413189 4.443940 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 588.6089 on 507 degrees of freedom Residual Deviance: 169.6683 on 502 degrees of freedom > Try$residuals%*%Try$residuals [,1] [1,] 0.9884745 Residual Deviance: 169.6683 on 502 degrees of freedom > RespResid.Try <- resid(Try, type="response") > RespResid.Try%*%RespResid.Try/508 [,1] [1,] 57.86894 > Try$residuals%*%Try$residuals [,1] [1,] 0.9884745 Start: AIC= 181.6683 Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb) AIC=169.6683 + 2*6=181.6683 BIC=169.6683 + 6*log(508)=207.0512 (But in Model 4: BIC=205.71) resid(Try, type="p")%*%resid(Try, type="p")=169.9458 resid(Try, type="pearson")%*%resid(Try, type="pearson")=169.9458 ##SAME -------------------------------------------- Try1 <- glm(formula = Mrt ~ y1 + y2 + T1 + I(Tmp^2 )+ log(Crb), family = poisson, data = LA1) Residual Deviance: 169.6683 on 502 degrees of freedom Start: AIC= 181.6683 Mrt ~ y1 + y2 + T1 + I(Tmp^2) + log(Crb) --------------------------------------------------- Try2 <- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2 )+ log(Crb), family = poisson, data = LA1) ##"FINAL" from above Residual Deviance: 166.2256 on 502 degrees of freedom Start: AIC= 178.2256 Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb) > r <- resid(Try2, type="response") > r%*%r/508 [,1] [1,] 56.42665 > Try2$residuals%*%Try2$residuals [,1] [1,] 0.9717937 ------------------------------------------------ A. Introduction to lm() and glm() ================================== glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = glm.control(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...) formula: an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. Ex: Reg1 <- glm(lot1 ~ log(u), data=clotting, family=Gamma) summary(Reg1) The "formula" gives the model: lot1 ~ log(u) Note: +,-,*,^,: etc. have special meaning in the "formula" in lm, glm. To interpret an expression literally, we need to use the identity function I(). Thus, to avoid confusion, the function 'I()' can be used to bracket those portions of a model formula where the operators are used in their arithmetic sense. For example, in the formula 'y ~ a + I(b+c)', the term 'b+c' is to be interpreted as the sum of 'b' and 'c'. Thus y is regressed on two covariates. This is different than 'y ~ a + b + c' where there are three covariates. Another example, suppose we want to include the product of a and b in a formula specification. If use a*b then this mean a or b or a*b!!! To include ONLY a*b use I(a*b) to protect the expression a*b: That is use lm(y~I(a*b))!!! Formally, the '*' operator denotes factor crossing. The '^' operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for 'a', 'b' and 'c' together with their second-order interactions. To remove the intercept term use 'y ~ x - 1'. The '-' operator removes the specified terms, so that '(a+b+c)^2 - a:b' is identical to 'a + b + c + b:c + a:c'. It can also used to remove the intercept term: 'y ~ x - 1' is a line through the origin. A model with no intercept can be also specified as 'y ~ x + 0' or 'y ~ 0 + x'. --------------------- Some formula examples: Y ~ X1, Y is modeled by X1. Y ~ X1 + X2, Y is modeled by X1 and X2 as in multiple regression. Y ~ X1 * X2, Y is modeled by X1, X2 and X1*X2. Y ~ (X1 + X2)^2, Two-way interactions. Note usual powers. Y ~ X1+ I(X2^2), Y is modeled by X1 and X2^2. Y ~ X1 | X2, Y is modeled by X1 conditioned on X2. ----------------------- Simple example first ====================== y <- c(183.63,191.05,180.09,184.67,173.60) x1 <- c(11.90,10.75,9.33,9.54,8.27) x2 <- c( 97.85,104.64,94.36,98.05, 95.85) > lm(y ~ x1+x2) Call: lm(formula = y ~ x1 + x2) Coefficients: (Intercept) x1 x2 60.511 1.651 1.077 A <- data.frame(cbind(y,x1,x2)) > is.data.frame(A) [1] FALSE > A y x1 x2 1 183.63 11.90 97.85 2 191.05 10.75 104.64 3 180.09 9.33 94.36 4 184.67 9.54 98.05 5 173.60 8.27 95.85 colnames(A) <- c('yy','xx1','xx2') > A yy xx1 xx2 1 183.63 11.90 97.85 2 191.05 10.75 104.64 3 180.09 9.33 94.36 4 184.67 9.54 98.05 5 173.60 8.27 95.85 > is.data.frame(A) [1] TRUE > yy ##Not good by itself Error: object 'yy' not found > A$yy ##OK now [1] 183.63 191.05 180.09 184.67 173.60 ##Within lm(),glm() we can refer to the col names provided the data frame is specified: > lm(formula = yy ~ xx1 + xx2, data=A) Call: lm(formula = yy ~ xx1 + xx2, data = A) Coefficients: (Intercept) xx1 xx2 60.511 1.651 1.077 ##Now glm() with family "gaussian": get the same thing. > glm(formula = yy ~ xx1 + xx2, family = gaussian, data = A) Call: glm(formula = yy ~ xx1 + xx2, family = gaussian, data = A) Coefficients: (Intercept) xx1 xx2 60.511 1.651 1.077 Degrees of Freedom: 4 Total (i.e. Null); 2 Residual Null Deviance: 164 Residual Deviance: 32.44 AIC: 31.54 ##If use family=poisson" for non-integer response, R gives warnings and AIC=Inf: > glm(formula = yy ~ xx1 + xx2, family = poisson, data = A) Call: glm(formula = yy ~ xx1 + xx2, family = poisson, data = A) Coefficients: (Intercept) xx1 xx2 4.545518 0.009147 0.005811 Degrees of Freedom: 4 Total (i.e. Null); 2 Residual Null Deviance: 0.9 Residual Deviance: 0.1824 AIC: Inf Warning messages: 1: In dpois(y, mu, log = TRUE) : non-integer x = 183.630000 ## realy yy 2: In dpois(y, mu, log = TRUE) : non-integer x = 191.050000 3: In dpois(y, mu, log = TRUE) : non-integer x = 180.090000 4: In dpois(y, mu, log = TRUE) : non-integer x = 184.670000 5: In dpois(y, mu, log = TRUE) : non-integer x = 173.600000 ##To avoide the warnings use: "family = quasipoisson". We get same estimates. AIC=NA > glm(formula = yy ~ xx1 + xx2, family = quasipoisson, data = A) Call: glm(formula = yy ~ xx1 + xx2, family = quasipoisson, data = A) Coefficients: (Intercept) xx1 xx2 4.545518 0.009147 0.005811 Degrees of Freedom: 4 Total (i.e. Null); 2 Residual Null Deviance: 0.9 Residual Deviance: 0.1824 AIC: NA ---------------------------------------- Now our environmental models ----------------------------- Thus, the following 2 models are identical since "formula" treats log(Crb) and I(log(Crb)) alike. A: -- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), family = poisson, data = LA.y1y2) B: -- glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)), family = poisson, data = LA.y1y2) A: -- > glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), + family = poisson, data = LA.y1y2) Call: glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + log(Crb), family = poisson, data = LA.y1y2) Coefficients: (Intercept) y1 y2 Tmp I(Tmp^2) log(Crb) 5.1751771 0.0017316 0.0020606 -0.0212812 0.0001410 0.0524105 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 588.6 Residual Deviance: 166.2 AIC: Inf There were 50 or more warnings (use warnings() to see the first 50) B: -- > glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)), + family = poisson, data = LA.y1y2) Call: glm(formula = Mrt ~ y1 + y2 + Tmp + I(Tmp^2) + I(log(Crb)), family = poisson, data = LA.y1y2) Coefficients: (Intercept) y1 y2 Tmp I(Tmp^2) I(log(Crb)) 5.1751771 0.0017316 0.0020606 -0.0212812 0.0001410 0.0524105 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 588.6 Residual Deviance: 166.2 AIC: Inf There were 50 or more warnings (use warnings() to see the first 50) But C: --- glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb), family = poisson, data = LA.y1y2) Is different: > glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb), + family = poisson, data = LA.y1y2) Call: glm(formula = Mrt ~ y1 + y2 + Tmp + Tmp^2 + log(Crb), family = poisson, data = LA.y1y2) Coefficients: (Intercept) y1 y2 Tmp log(Crb) 4.352e+00 1.842e-03 2.172e-03 -8.707e-05 5.293e-02 Degrees of Freedom: 507 Total (i.e. Null); 503 Residual Null Deviance: 588.6 Residual Deviance: 183.6 AIC: Inf There were 50 or more warnings (use warnings() to see the first 50) R simply dropped Tmp^2!!! To see this, go to D where there is no Tmp^2. D: --- glm(formula = Mrt ~ y1 + y2 + Tmp + log(Crb), family = poisson, data = LA.y1y2) Call: glm(formula = Mrt ~ y1 + y2 + Tmp + log(Crb), family = poisson, data = LA.y1y2) Coefficients: (Intercept) y1 y2 Tmp log(Crb) 4.352e+00 1.842e-03 2.172e-03 -8.707e-05 5.293e-02 Degrees of Freedom: 507 Total (i.e. Null); 503 Residual Null Deviance: 588.6 Residual Deviance: 183.6 AIC: Inf There were 50 or more warnings (use warnings() to see the first 50) ================================================================== B. Comparison between Poisson and Negative Binomial See: Venables and Ripley pp. 206, 207 ========================================================= 1. Poisson ------------- PoisReg.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=poisson, data=LA) par(mfrow=c(2,2)) acf(PoisReg.All$residuals) ##FIG5 Not good! cpgram(PoisReg.All$residuals) > PoisReg.All Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, family = poisson, data = LA) Coefficients: (Intercept) Rsp Crd Tmp Hum Crb 4.485e+00 6.005e-03 6.380e-03 -1.204e-04 -7.808e-05 4.958e-04 Slf Nit Hdr Ozn Par -7.446e-03 -4.668e-04 7.655e-04 2.230e-03 1.080e-04 Degrees of Freedom: 507 Total (i.e. Null); 497 Residual Null Deviance: 588.6 Residual Deviance: 57.85 AIC: Inf ----------------------------------------------------- 2. NB ---------- ##Need library MASS: library(MASS) Specifies the information required to fit a Negative Binomial generalized linear model, with known 'theta' parameter, using 'glm()'. Usage: negative.binomial(theta = stop("'theta' must be specified"), link = "log") a. Now use glm() Do not specify link, and take theta=2 ------------------------------------------------------------- NB.All <- glm(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, family=negative.binomial(2),data=LA) > NB.All$family Family: Negative Binomial(2) Link function: log > NB.All Call: glm(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, family = negative.binomial(2), data = LA) Coefficients: (Intercept) Rsp Crd Tmp Hum Crb 4.477e+00 6.189e-03 6.413e-03 -6.990e-05 -7.571e-05 3.864e-04 Slf Nit Hdr Ozn Par -7.551e-03 -4.854e-04 7.599e-04 2.184e-03 1.351e-04 Degrees of Freedom: 507 Total (i.e. Null); 497 Residual Null Deviance: 6.743 ##Much smaller Residual Deviance: 0.6798 AIC: 5861 ###Get AIC!!! > NB.All$family ###Note!! Family: Negative Binomial(2) Link function: log par(mfrow=c(2,2)) acf(NB.All$residuals) ##FIG5 Not good! cpgram(NB.All$residuals) b. Now use glm.nb. R estimates theta The coeef estimates are close to previous case using glm() ---------------------------------------------------------------- NB.All <- glm.nb(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, data=LA) glm.nb(formula, data, weights, subset, na.action, start = NULL, etastart, mustart, control = glm.control(...), method = "glm.fit", model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ..., init.theta, link = log) > NB.All Call: glm.nb(formula = Mrt ~ Rsp + Crd + Tmp + Hum + Crb + Slf + Nit + Hdr + Ozn + Par, data = LA, init.theta = 26049012.6728116, link = log) Coefficients: (Intercept) Rsp Crd Tmp Hum Crb 4.485e+00 6.005e-03 6.380e-03 -1.204e-04 -7.808e-05 4.958e-04 Slf Nit Hdr Ozn Par -7.446e-03 -4.668e-04 7.655e-04 2.230e-03 1.080e-04 Degrees of Freedom: 507 Total (i.e. Null); 497 Residual Null Deviance: 588.6 Residual Deviance: 57.84 AIC: 3620 > NB.All$theta ###NOTE!!!! [1] 26049013 If specify link=log get the same thing. NB.All <- glm.nb(Mrt ~ Rsp+Crd+Tmp+Hum+Crb+Slf+Nit+Hdr+Ozn+Par, link = log, data=LA) > attributes(NB.All) $names [1] "coefficients" "residuals" "fitted.values" [4] "effects" "R" "rank" [7] "qr" "family" "linear.predictors" [10] "deviance" "aic" "null.deviance" [13] "iter" "weights" "prior.weights" [16] "df.residual" "df.null" "y" [19] "converged" "boundary" "model" [22] "call" "formula" "terms" [25] "data" "offset" "control" [28] "method" "contrasts" "xlevels" $class [1] "glm" "lm" > NB.All$family Family: Negative Binomial(2) Link function: log ========================== Comparison: Best model, Poisson,NB,Quasi ##Need library(MASS) ###Poisson PoisReg.TCy1y2 <-glm(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, family = poisson, data = LA.y1y2) Coefficients: (Intercept) Tmp I(Tmp^2) log(Crb) y1 y2 5.175177 -0.021281 0.000141 0.052411 0.001732 0.002061 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 588.6 Residual Deviance: 166.2 AIC: Inf Nice way to plot two ts ts.plot(ts(PoisReg.TCy1y2$y), ts(PoisReg.TCy1$fitted.values),xlab="Weeks",ylab="Counts",type="l",col=c(1,2)) legend('topright', col=c(1,2), c('original','QLM'),pch=c('-','-')) ###NB NB_Reg.TCy1y2 <- glm.nb(formula = Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, link = log, data = LA.y1y2) Coefficients: (Intercept) Tmp I(Tmp^2) log(Crb) y1 y2 5.175177 -0.021281 0.000141 0.052411 0.001732 0.002061 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 588.6 Residual Deviance: 166.2 AIC: 3719 ###NB2 NB2_Reg.TCy1y2 <- glm( Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, family = negative.binomial(2), data = LA.y1y2) Coefficients: (Intercept) Tmp I(Tmp^2) log(Crb) y1 y2 5.1787370 -0.0212338 0.0001405 0.0519774 0.0016709 0.0021005 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 6.743 Residual Deviance: 1.919 AIC: 5852 ###Quasi Qausi_Reg.TCy1y2 <- glm(Mrt ~ Tmp + I(Tmp^2) + log(Crb) + y1 + y2, data= LA.y1y2, family = quasi(link = "log", variance = "mu")) Coefficients: (Intercept) Tmp I(Tmp^2) log(Crb) y1 y2 5.175177 -0.021281 0.000141 0.052411 0.001732 0.002061 Degrees of Freedom: 507 Total (i.e. Null); 502 Residual Null Deviance: 588.6 Residual Deviance: 166.2 AIC: NA