Stat 701,      Spring '14               Eric Slud

	HOMEWORK PROBLEM ASSIGNMENTS

Homeworks will generally be due on a Friday by 5pm, and should be submitted as 
hard-copy. If you will be off-campus then you may submit these electronically, 
but in that case you are expected to provide a hard-copy at the following 
class meeting.

Late Homeworks will be accepted up to one class meeting later than the due date,
but you will lose 20% credit for late submission, unless you have a medical 
excuse or have obtained in advance an extension from me personally.

--------------------------------------

PROBLEM SET 1, due Friday, 2/7/14: 
   Shao text, Ch.1 #117, 122, 127(a),(c),(e), 133, 156; and Ch.2 #103, 110.

In addition: use the Liapunov or Lindeberg CLT to show that if X_k are 
independent Expon(1) random variables for all positive integers k, then
both of the sums $\sum_{k=1}^n  sqrt(k) X_k$  and
$ \sum_{m=1}^n \sum_{k=1}^m X_k $ after centering and scaling have 
nondegenerate asymptotically normal distributions as n goes to infinity.

--------------------------------------

PROBLEM SET 2, was originally due Friday, 2/21/14 but NOW DUE in class 2/24/14:

   Shao text, Ch.2 #114, 117. Ch.3 #108 with the discrete probability mass 
function replaced by: P(X_1=0) = 1-theta,  P(X_1=1) = theta*(1-theta), 
P(X_1=2) = theta^2.
   Also do: Shao Ch.3 #109 and Ch.4 #94, 103, 114, 119.

EXTRA-CREDIT PROBLEM: show that for any consistent sequence of estimators 
T_n for theta, there exists a sequence a_n of constants increasing to 
infinity with n sufficiently slowly that T_n is a_n consistent for theta.

--------------------------------------

PROBLEM SET 3, was originally due Friday 3/7/14, but NOW DUE in class 3/10/14.
   First three problems are posted on the web-page, at link 
         http://www.math.umd.edu/~evs/s701/HW3S14.pdf .

Additional problems from Shao text, due 3/7/14: Ch.4 #134, 145, 154,
     and Ch.5 #95. 

NOTE that #134 in Ch.4 contains a misprint. The second conditional 
probability used in specifying the model should be 
$P(Y_1=1 | P(X_1=1) = \exp(-b\theta)$ .

--------------------------------------

PROBLEM SET 4, due Friday 3/28/14.

(1) Suppose that i.i.d. random variables  X_1, X_2,...,X_n are i.i.d. 
continuous valued real random random variables. For each j=1,...,n, let  
r_j = #{i=1,...,n: |X_i| < |X_j|}.

(a) Show for i not equal to j  that 
        I[X_i+X_j < 0] = I[X_i<0, |X_j| < |X_i|] +  I[X_j<0, |X_j| > |X_i|]

(b) Use (a) to show that the statistic W_n = \sum_{j=1}^n r_j * I[X_j < 0] 
is exactly the same as the U-statistic U_n given as the "one-sample 
U-statistic" in the 3rd displayed equation on p.175 of Shao's book.

(2) Do problems # 49, 52 and 54 on p.223 in Chapter 3 of Shao.

(3) Do problem #7 on p.347 (Sec.5.6) of Bickel and Doksum, where the 
minimum contrast estimator is restricted to be taken over the compact 
subset K of the real line.

(4) Consider the problem of estimating the mean  \lambda  based on 
Poisson(\lambda) observations Z_1,...,Z_n. (a) Show directly that the 
Bayes estimator T_n with prior  Gamma(2,2) for squared-error loss 
satisfies the conclusion of the Theorem 4.20 in Shao's book, that is, 
the posterior distribution is close in the sense of distributional 
convergence to a normal distribution with mean 0 and asymptotic 
variance the same as the MLE. (b) What is the order of magnitude 
(as a power of n) of the bias of T_n ?

---------------------------------------------------------

PROBLEM SET 5, due Friday 4/11/14.

(1) Show that if  X_1,...,X_n  is a sample from a continuous 
distribution function with everywhere increasing distribution 
function, then   
         rho(X,theta) = (X-theta)*(I[X >= theta] - .75) 
is a legitimate contrast function for the lower quartile. What 
additional conditions must be assumed in order that the resulting
minimum contrast estimator satisfies the hypotheses of the Theorem 
covered in class (from the Bickel and Doksum text, along with a 
slight extension mentioned in the two pages handed out on class) 
guaranteeing asymptotic normality of minimum contrast estimators ?

(2) [counts as 3 problems] 
Suppose that the random variables X_i are Bernoulli(p) and Y_i given
X_i are binary with conditional distribution 
          P(Y_i=1 | X_i) = 1/(1+exp(-a-b X_i)) = plogis(a+bX_i)
This is a logistic regression model. 
    (a) Find the ML equations and use the asymptotic theory we 
derived in Ch.4 (Theorem 4.18) to find expressions for the 
asymptotic variance.
    (b) Also find a general expression for the asymptotic variance 
matrix of a CAN estimator of (a,b) based on the estimating equation
     sum_{i=1}^n  g(X_i) ( Y_i - plogis(a+bX_i) )
where plogis(x) = e^x/(1+e^x). Make reasonable assumptions on the 
2-vector function g(x)  so that the moments in this estimating 
equation exist.
    (c) Using your answer in (b), can you say what would be the 
best choice for the function g(x) ?

(3) Do problems 90 and 91 in Shao's Ch.5, pp.390-391.

---------------------------------------------------------

PROBLEM SET 6, due Friday 4/25/14, 5pm.

(1) [Counts as 2 Problems] Suppose that we observe iid data pairs 
(X_i,Y_i) where we believe that X_i ~ N(mu, c^2) and that the conditional 
distribution of  Y_i  given X_i is:  Y_i ~ Poisson(exp(a+b*X_i)).

(a) Derive the ML estimating equation for (a,b), a formula for the 
asymptotic variance of the MLE's of (a,b), and a consistent estimator 
for the asymptotic variance, assuming the correctness of this model. 
Which of these --- your MLE equation, the theoretical formula for the 
asymptotic variance, and the consistent estimator of the asymptotic 
variance --- depend on the parameters (mu,c) of the normal distribution
of X_i ?

(b) Suppose that the assumed Poisson regression model is questionable, 
but that you still would like to know the variance of the MLE derived 
in part (a) based on the iid pairs (X_i,Y_i), i=1,...,n. Give 
a consistent "robustified" variance for the asymptotic variance. 
Calculate numerically the MLE's and the asymptotic variance estimators 
in parts (a) and (b) if, for a particular dataset of size n=200, you 
are given the MLE's  ahat = -0.9648 and bhat = 0.3982  and the data 
summarized in the following form:

Group    #elts     sum of X_i's       sum of X_i^2

Y=0      132         -11.085             123.573
Y=1       55          12.168              52.689
Y=2        9           7.658              15.192
Y=3        4           4.749               8.202

(c) What do you think is the reasonable thing to report in this data 
problem as the confidence interval for a and for b ?

(2) Shao Ch.5, problems 61, 87, 101.

(3) Shao Ch.6, problem 99, 106.

---------------------------------------------------------

PROBLEM SET 7, due Friday 5/9/14, 5pm.

(1) [(a) For the general problem of testing the model p_j = g_j(theta) 
withinthe Multinomial (n_1,...,n_K; p_1,...,p_K) data setting, 
Section 6.4.3 of Shao's book gives the exact and asymptotic forms of 
the likelihood ratio test. Give the form of the Wald and Rao Score tests 
in this setting, explaining what functions would have to be maximized 
or minimized to obtain the necessary estimators and exact how you would 
express the Wald and Rao Score test statistics of the null hypothesis 
of model adequacy in terms of them. Simplify your expressions as much 
as possible.

(b) Apply your Wald and Rao-score formulas to the setting of 
"independence testing" in Example 6.24, pp.439-440.

(2) Rao Ch.6, problem 96, 

(3) Rao Chapter 7, problems 72, 74 (with the specific kernel 
h(x_1,x_2) = 0.5*(I[x_1 < 2x_2] + I[x_2 < 2x_1]) ,  and  also 
82, 88, and 93. In problem 88, you are asked to prove that two 
expressions (one of which you must find) are the "same", but you 
should understand this to mean that the function h cannot depend 
on  n (although Y_i may depend on $\theta$), and the two 
expressions are allowed to differ by a quantity converging to 
0 in probability as n gets large.