Lecture 1.R: Illustration of cross-tabulation and 
questions asked of multiway contingency table
-------------------------------------------------

> infect = read.table("http://users.stat.ufl.edu/~aa/cda/data/Infection.dat", 
       header=T)           ### this option reads first line as column names

### A 16x4 data-frame in R, with 1st line as column names 
###     (that is the reason for the argument "header=T")

> t(infect)
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
center    1    1    2    2    3    3    4    4    5     5     6     6     7     7     8     8
treat     1    0    1    0    1    0    1    0    1     0     1     0     1     0     1     0
y        11   10   16   22   14    7    2    1    6     0     1     0     1     1     4     6
n        36   37   20   32   19   19   16   17   17    12    11    10     5     9     6     7

# "y" = treatmt resp = success, "treat" = indicator of experimental group
# Data: treatmt ctr identifiers, & counts of successes & failures
# "treat" is a "dummy column" for purpose of regression,
#  e.g. of   y counts/(y+n counts)    [or log, or logit]  versus  "treat"

### Alternative data presentation as a multi-way table
> infect.arr = array( data.matrix(infect[,3:4]), c(2,8,2), dimnames = 
        list(c("Drug","Control"), 1:8, c("Success","Failure")) )

> infect.arr
, , Success
         1  2  3 4 5 6 7 8
Drug    11 16 14 2 6 1 1 4
Control 10 22  7 1 0 0 1 6

, , Failure
         1  2  3  4  5  6 7 8
Drug    36 20 19 16 17 11 5 6
Control 37 32 19 17 12 10 9 7

## Coarsen the cross-tabulation by summing across unwanted level, say "center" 

> Infec.Obs = apply(infect.arr,c(1,3), sum)
  Infec.Obs
        Success Failure
Drug         55     130
Control      47     143

### Obvious question here is whether Drug is associated with "Success"

> chisq.test(Infec.Obs)
        Pearson's Chi-squared test with Yates' continuity correction

data:  Infec.Obs
X-squared = 0.94137, df = 1, p-value = 0.3319

### Idea is to compare the "Observed" table above with what we might expect
###    if  "Success" and "Failure" occurred with the same proportion estimated by 

> c(sum(infect.arr[,,1]), sum(infect.arr),   sum(infect.arr[,,1])/sum(infect.arr) )
[1] 102.000 375.000   0.272


## The "expected" table would be given with successes a proportion .272 of the 185 
##    patients given the drug  and similarly .272 of the 190 Control patients:

> Infec.Exp = outer( c(185,190), c(.272, .728), "*" )
> Infec.Exp
      [,1]   [,2]
[1,] 50.32 134.68
[2,] 51.68 138.32

## "Chi-squared" test-statistic to be developed in class as Likelihood Ratio Test 
## of "No Association", compares the two sets of 4 Obs and Exp counts

> rbind( Obs = c(Infec.Obs), Exp = c(Infec.Exp))
     [,1]  [,2]   [,3]   [,4]
Obs 55.00 47.00 130.00 143.00
Exp 50.32 51.68 134.68 138.32

 > sum(c((Infec.Obs-Infec.Exp)^2/Infec.Exp))
[1] 1.180042

### or after "Yates continuity correction"

> sum(c((abs(Infec.Obs-Infec.Exp) - 0.5)^2/Infec.Exp))
[1] 0.9413654                              ### SAME as in chisq.test

### As a "goodness of fit" statistic for a multinomial vector, this would be 
###     compared to chi-squared 3 df.
### But here our "Expected" table was based on "estimates" tallying proportions 
###     receiving Drug and proportions of "Success" reducing df by 2
> 1-pchisq(0.94137,1)
[1] 0.3319258

### Another thing we could do is ask whether the 8 separate 2x2 tables would 
###  show significant association between y/n and treat, e.g in the 5th center:
> infect.arr[,5,]
        Success Failure
Drug          6      17
Control       0      12

### We put all of the (continuity-corrected) X-sq statistics & p-values together 
##    into an array indexed by center:   
> stat.arr = t(round(apply(infect.arr,2, function(twbytw.table) 
               unlist(chisq.test(twbytw.table)[c(1,3)])),3))
### messages that chi-sq approx may not be correct [because of small sample size]

> stat.arr
  statistic.X-squared p.value
1               0.000   1.000
2               0.017   0.896
3               0.923   0.337
4               0.000   1.000
5               2.165   0.141
6               0.000   1.000
7               0.000   1.000
8               0.000   1.000

### So nothing seems to be going on except possibly in center 5
###     and that effect is not strong !