Lecture 1.R: Illustration of cross-tabulation and questions asked of multiway contingency table ------------------------------------------------- > infect = read.table("http://users.stat.ufl.edu/~aa/cda/data/Infection.dat", header=T) ### this option reads first line as column names ### A 16x4 data-frame in R, with 1st line as column names ### (that is the reason for the argument "header=T") > t(infect) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] center 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 treat 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 y 11 10 16 22 14 7 2 1 6 0 1 0 1 1 4 6 n 36 37 20 32 19 19 16 17 17 12 11 10 5 9 6 7 # "y" = treatmt resp = success, "treat" = indicator of experimental group # Data: treatmt ctr identifiers, & counts of successes & failures # "treat" is a "dummy column" for purpose of regression, # e.g. of y counts/(y+n counts) [or log, or logit] versus "treat" ### Alternative data presentation as a multi-way table > infect.arr = array( data.matrix(infect[,3:4]), c(2,8,2), dimnames = list(c("Drug","Control"), 1:8, c("Success","Failure")) ) > infect.arr , , Success 1 2 3 4 5 6 7 8 Drug 11 16 14 2 6 1 1 4 Control 10 22 7 1 0 0 1 6 , , Failure 1 2 3 4 5 6 7 8 Drug 36 20 19 16 17 11 5 6 Control 37 32 19 17 12 10 9 7 ## Coarsen the cross-tabulation by summing across unwanted level, say "center" > Infec.Obs = apply(infect.arr,c(1,3), sum) Infec.Obs Success Failure Drug 55 130 Control 47 143 ### Obvious question here is whether Drug is associated with "Success" > chisq.test(Infec.Obs) Pearson's Chi-squared test with Yates' continuity correction data: Infec.Obs X-squared = 0.94137, df = 1, p-value = 0.3319 ### Idea is to compare the "Observed" table above with what we might expect ### if "Success" and "Failure" occurred with the same proportion estimated by > c(sum(infect.arr[,,1]), sum(infect.arr), sum(infect.arr[,,1])/sum(infect.arr) ) [1] 102.000 375.000 0.272 ## The "expected" table would be given with successes a proportion .272 of the 185 ## patients given the drug and similarly .272 of the 190 Control patients: > Infec.Exp = outer( c(185,190), c(.272, .728), "*" ) > Infec.Exp [,1] [,2] [1,] 50.32 134.68 [2,] 51.68 138.32 ## "Chi-squared" test-statistic to be developed in class as Likelihood Ratio Test ## of "No Association", compares the two sets of 4 Obs and Exp counts > rbind( Obs = c(Infec.Obs), Exp = c(Infec.Exp)) [,1] [,2] [,3] [,4] Obs 55.00 47.00 130.00 143.00 Exp 50.32 51.68 134.68 138.32 > sum(c((Infec.Obs-Infec.Exp)^2/Infec.Exp)) [1] 1.180042 ### or after "Yates continuity correction" > sum(c((abs(Infec.Obs-Infec.Exp) - 0.5)^2/Infec.Exp)) [1] 0.9413654 ### SAME as in chisq.test ### As a "goodness of fit" statistic for a multinomial vector, this would be ### compared to chi-squared 3 df. ### But here our "Expected" table was based on "estimates" tallying proportions ### receiving Drug and proportions of "Success" reducing df by 2 > 1-pchisq(0.94137,1) [1] 0.3319258 ### Another thing we could do is ask whether the 8 separate 2x2 tables would ### show significant association between y/n and treat, e.g in the 5th center: > infect.arr[,5,] Success Failure Drug 6 17 Control 0 12 ### We put all of the (continuity-corrected) X-sq statistics & p-values together ## into an array indexed by center: > stat.arr = t(round(apply(infect.arr,2, function(twbytw.table) unlist(chisq.test(twbytw.table)[c(1,3)])),3)) ### messages that chi-sq approx may not be correct [because of small sample size] > stat.arr statistic.X-squared p.value 1 0.000 1.000 2 0.017 0.896 3 0.923 0.337 4 0.000 1.000 5 2.165 0.141 6 0.000 1.000 7 0.000 1.000 8 0.000 1.000 ### So nothing seems to be going on except possibly in center 5 ### and that effect is not strong !