MiniScript related to Swiss BankNotes, using Testing Idea from Lecture
======================================================================
STAT 750 Spring 2022                                            2/14/22

> bankn = data.matrix(read.table("http://www.math.umd.edu/~evs/s750/data/banknote.dat")[,-1])

mu2 = apply(bankn[101:200,],2,mean)
S = 0.99*var(bankn[101:200,]-  rep(1,100)%o%mu2)  ## for counterfeit banknotes

 > Salt = t(bankn[101:200,]-  rep(1,100)%o%mu2) %*%
+           (bankn[101:200,]-  rep(1,100)%o%mu2) / 100
> sum(abs(Salt-S))
[1] 1.348639e-15

## Treat mean for genuine banknotes as reference

mu1 = apply(bankn[1:100,],2,mean)

> rbind(mu1,mu2)
     Length    Left   Right Bottom    Top Diagonal
mu1 214.969 129.943 129.720  8.305 10.168  141.517
mu2 214.823 130.300 130.193 10.530 11.133  139.450


T2 = 100*c((mu1-mu2)%*% solve(S, mu1-mu2))  
> T2
[1] 7362.316    ## suggests highly significant because >> qchisq(.95,6) = 12.59159

## Alternate computation from LRT: 
> 100*log(1+T2/100)
[1] 431.2451


## But what is the correct reference distribution for this "Hotelling T^2"?

## Install package MVTests

> library(MVTests)
> OneSampleHT2(bankn[101:200,],mu1,0.95)$HT2
         [,1]
[1,] 7288.693

> 7288.693/.99    ## indicates that denominator n in S above 
[1] 7362.316     ##  must be replaced by n-1 to reproduce standard HT2 value

### Two-sample Hotelling T2 next ...

> TwoSamplesHT2(bankn, rep(1:2,rep(100,2)))$HT2
[1,] 2412.451

> Spool = 0.5*(var(bankn[101:200,]-  rep(1,100)%o%mu2) + 
               var(bankn[1:100,]-  rep(1,100)%o%mu1))

> TwSampT2 = (100/2)*c((mu1-mu2)%*% solve(Spool, mu1-mu2)) 
> TwSampT2
[1] 2412.451          ### OK, agrees with MVTests software
 

#--------------- Slightly more interesting examples

> OneSampleHT2(bankn[101:200,c(1:3,5)],mu1[c(1:3,5)],0.95)$HT2
         [,1]
[1,] 667.4621

> TwoSamplesHT2(bankn[,c(1:3,5)], rep(1:2,rep(100,2)))$HT2
        [,1]
[1,] 261.577   # still wildly significant even without varbls "Bottom, Diagonal"

#------------------
  names(Coated) = c("","Depth","Number","Depth","Number")  
     ## 15 obs "Depth", "Number", for each of two "Coating" treatments
  apply(Coated[,2:5],2,mean)
   Depth   Number  Depth.1 Number.1 
54.20000 26.20000 46.20000 23.13333 

> tmpCoated = rbind.data.frame(Coated[,2:3],Coated[4:5])
     ## 1st 15 rows "Coating1" next 15 rows "Coating2"

> TwoSamplesHT2(tmpCoated, rep(1:2,rep(15,2)))$HT2
         [,1]
[1,] 4.994308      ## how extreme a value is this for HotellingT2 ?

> TwoSamplesHT2(tmpCoated, rep(1:2,rep(15,2)))[4:5]
$p.value
          [,1]
[1,] 0.1090703

$CI
           Lower    Upper Important Variables?
Depth  -2.769019 18.76902                FALSE
Number -4.757879 10.89121                FALSE

## Let's verify roughly the HT2 p-value (HT2 with 2 dimensions, 28df)

> Ztmp = array(rnorm(2e5), c(1e5,2))
  Z2tmp = array(rnorm(56e5), c(1e5,28,2))
  tmp = apply(Z2tmp,1, function(mat) t(mat)%*%mat)/28    ## took 3 secs
> dim(tmp)
[1]      4 100000
  tmp2 = cbind(Ztmp, t(tmp))    ## 1e5 x 6, 1st two cols Z, next 4 cols W
  Tsq = c(apply(tmp2,1, function(row) row[1:2] %*% 
            solve(array(row[3:6], c(2,2)),row[1:2])))    ## took 2 secs
> mean(Tsq > 4.99)
[1] 0.10958                ### compare with p-value 0.10907

## on another run:
> mean(Tsq > 4.994308)
[1] 0.10935            ## closer to the "theoretical" p-value 
                     ## given above by  MVTests

-----------------------------------------------------------------
# Further steps from theory discussed on 2/21/22:
-----------------------------------------------------------------

We calculated a 2-sample Hotelling T^2 tests statistic, which we saw was
T^2(2,28) in the MKB notation. Moreover (from Thm 3.5.2)

     T^2(2,28)  =  (28*2/(28-2+1)) F_{2,28-2+1} = (56/27)*F_{2,27}

So the p-value for 4.994308 is  P(F_{2,27} > (27/56)*4.994308)

> 1- pf((27/56)*4.994308, 2,27)
[1] 0.1090703                      ## exactly the p-value given by MVTests

--------------------------------------------------------------------------