Class Log on EM Algorithm for Fay-Herriot Model
=============================================== 10/23/17

Example:

X_i = mu + Y_i + e_i ,   i=1,...,n

    Y_i = "area random effect" ~ N(0, sig^2)
	
	e_i = "sampling error" for i'th "small area" ~ N(0, si^2),   si^2  KNOWN
	
Target for estimation/prediction in i'th small area is:  mu + Y_i

Observed data:   only  X_i,  i=1,...,n

Unknown parameters:  mu, sig^2

NOTE that  Y_i =  (X_i-mu) * sigsq/(sigsq+sisq) + W_i

with X_i, W_i jointly normal AND Cov(Wi, X_i) = 0

ALSO: joint logLik = -0.5*log(sigsq) - (Yi^2)/(2*sigsq) - (Xi-Yi-mu)^2/(2*sisq)


Then   put   gami = sigsq/(sigsq+sisq)   and   find  EM iteration.

It follows that  Yi  given  Xi  is  ~ N( gami*(Xi-mu), gami*sisq )


> mu.tru = 10
  sig.tru = 1
  set.seed(5512)
  sisq.tru = rgamma(25, 3, 4)
> mean(sisq.tru)
[1] 0.7450129

  
> sisq.tru
 [1] 0.7308503 0.4835953 1.0592969 1.4329704 0.4336712 0.3154521 1.2669986
 [8] 0.9163511 1.0246449 0.8765097 0.8594322 0.8249106 0.4986511 0.3319555
[15] 0.5337501 1.0441304 0.8608805 1.1279333 0.6646755 0.3907703 0.6965859
[22] 0.5951980 0.1071784 0.3383700 1.2105607

> Xvec = rnorm(25, mu.tru, sig.tru^2 + sisq.tru)        #### THE "OBSERVED DATA"
  
> mean(Xvec)
[1] 10.16856      ### Reasonable intial value for mu, correct if all sisq.tru were equal
  
		   
> EMiter = function(th0,xv) {
        mu0 = th0[1]		 
        gamvec0 = th0[2]/(th0[2]+sisq.tru)
        th1 = c(mu1=sum((xv-gamvec0*(xv-mu0))/sisq.tru)/sum(1/sisq.tru), 
          sigsq1=mean(gamvec0*sisq.tru + (gamvec0*(xv-mu0))^2))
	list(th1 = th1, logLik = sum(log(dnorm(xv,th1[1], sqrt(th1[2]+sisq.tru))))) }
		
	
### Now let's do the ML fitting two ways:
## METHOD 1 (DIRECT numerical max)
	 
> MLopt = optim( c(5,0.5), function(th,xv) -sum(log(dnorm(xv,th[1],sqrt(th[2]+sisq.tru)))),
     xv=Xvec, lower=c(-Inf,0), method="L-BFGS-B", hessian=T)
$par
[1] 10.1523339  0.9638584          #### compare with 10.0  and 1.0

$value
[1] 42.48664

$counts
function gradient 
      20       20 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

$hessian
           [,1]       [,2]
[1,] 15.2282320 -0.1219779
[2,] -0.1219779  4.4358076


### So SEs for the two parameters are: 
> sqrt(diag(solve(MLopt$hess)))
[1] 0.2562849 0.4748555                #### NB. MLEs for variances are known to be slightly
                              #### downwardly biased for moderate samples in this setting

		
## METHOD 2 (EM algorithm iterations)	
		   
> unlist(EMiter(MLopt$par, Xvec))
    th1.mu1  th1.sigsq1      logLik 
 10.1523343   0.9638566 -42.4866387 

> th1 = c(5,.5)
> for (i in 1:50) {
     tmp = EMiter(th1,Xvec)
     th1 = tmp$th1
     if(i %%5 == 0) cat("iter ",i,": ,th1= ",tmp$th1,",  
           logLik=",tmp$logLik,"\n") }
iter  5 : ,th1=  8.053145 6.346062 ,  
           logLik= -58.54178 
iter  10 : ,th1=  8.872946 3.365782 ,  
           logLik= -51.1927 
iter  15 : ,th1=  9.643887 1.625103 ,  
           logLik= -44.50412 
iter  20 : ,th1=  10.04394 1.0743 ,  
           logLik= -42.59622 
iter  25 : ,th1=  10.13589 0.9786729 ,  
           logLik= -42.48919 
iter  30 : ,th1=  10.15001 0.9657298 ,  
           logLik= -42.48669 
iter  35 : ,th1=  10.15201 0.9640861 ,  
           logLik= -42.48664 
iter  40 : ,th1=  10.15229 0.9638816 ,  
           logLik= -42.48664 
iter  45 : ,th1=  10.15233 0.9638565 ,  
           logLik= -42.48664 
iter  50 : ,th1=  10.15233 0.9638535 ,  
           logLik= -42.48664 

### So we get to the same place via EM, but it takes 45 iterations !!