Class Log on Misspecified Model MLE
=================================== 10/16/17

Example:

## X_i ~ dlogis(x,1,sqrt(3)/pi) ; estimate parameters by Xbar, S^2 *(n-1)/n ie by Normal dist MLE 

> Xdat = rlogis(500,1,sqrt(3)/pi)           ### true mu, sig = 1

> parm = c(mean(Xdat), var(Xdat)*(499/500))
> parm
[1] 1.0792431 0.9751762

## Estimators are consistent but `empirical' Fisher info matrix is obtained numerically 
##   from Hessian of optimization:

> misfit =optim(parm, function(thet,X) sum((X-thet[1])^2)/(2*thet[2]) + (length(X)/2)*log(thet[2]), 
          X=Xdat, control=list(reltol=1e-10), hessian=T)
$par
[1] 1.0792427 0.9751692

$value
[1] 243.7157

$counts
function gradient 
      59       NA 

$convergence
[1] 0

$message
NULL

$hessian
             [,1]         [,2]
[1,] 5.127315e+02 2.312959e-04
[2,] 2.312959e-04 2.628991e+02


### CONCENTRATE ON THIS `EMPIRICAL' INFORMATION MATRIX

> Amat = misfit$hess/500
             [,1]         [,2]
[1,] 1.025463e+00 4.625917e-07
[2,] 4.625917e-07 5.257982e-01


> gNvec = cbind((Xdat-parm[1])/parm[2], (-1/parm[2]+ (Xdat-parm[1])^2/parm[2]^2)/2)

> Bmat =  0.002*(t(gNvec) %*% gNvec)
            [,1]        [,2]
[1,]  1.00000000 -0.08165924
[2,] -0.08165924  0.82493053

### NB exact expectations given by

> integrate(function(x) (x-1)^2 * dlogis(x,1,sqrt(3)/pi), -Inf,Inf)
1 with absolute error < 6.9e-07
> integrate(function(x) (x-1)*((x-1)^2-1) * dlogis(x,1,sqrt(3)/pi), -Inf,Inf)
-3.082756e-12 with absolute error < 2e-07
> integrate(function(x) ((x-1)^2-1)^2 * dlogis(x,1,sqrt(3)/pi)/4, -Inf,Inf)
0.8 with absolute error < 1.5e-05

### NOW THINK OF HOW WE WOULD REPORT LARGE-SAMPLE VARIANCES OF ML ESTIMATORS
### IF we thought data came from N(mu,sigsq)  we would report solve(Amat) 

> solve(Amat)
               [,1]          [,2]
[1,]  9.751692e-01 -8.579437e-07
[2,] -8.579437e-07  1.901870e+00

### But the actual empirical variance (* 500) is: 1/sigsq = 1  for  Xbar,
### and for S^2 is approximately
> integrate( function(x) ((x-1)^4-1)*dlogis(x,1,sqrt(3)/pi), -Inf,Inf)
3.2 with absolute error < 5.4e-05


## CONTRAST: 
> integrate( function(x) ((x-1)^4-1)*dnorm(x,1,1), -Inf,Inf)
2 with absolute error < 5.6e-05

#### In this setting, the "robustified" variance (*500) is

> solve(Amat) %*% Bmat %*% solve(Amat)
           [,1]       [,2]
[1,]  0.9509551 -0.1514511
[2,] -0.1514511  2.9838656

### MAIN POINT IS THAT LOWER-RIGHT ELEMENT IS MUCH CLOSER TO 3.2 !!!

### FURTHER STEPS WE COULD DO TO AMPLIFY THE EXERCISE ARE:

(1) Simulate many batches of  n  logistic variates and estimate parameters as though normal
and verify that the actual asymptotic (large-n) variances are well predicted by the robustified
"Sandwich formula".

(2) Re-do the steps in other examples of true f versus working g densities, to see the variety of
examples where ML estimators from wrong model can continue to be very bad, but with variances WELL
estimated by robustified formula.

HW11 due Monday 10/23 will do steps (1) and (2) in another example.