Stat 701, Spring '14 Eric Slud HOMEWORK PROBLEM ASSIGNMENTS Homeworks will generally be due on a Friday by 5pm, and should be submitted as hard-copy. If you will be off-campus then you may submit these electronically, but in that case you are expected to provide a hard-copy at the following class meeting. Late Homeworks will be accepted up to one class meeting later than the due date, but you will lose 20% credit for late submission, unless you have a medical excuse or have obtained in advance an extension from me personally. -------------------------------------- PROBLEM SET 1, due Friday, 2/7/14: Shao text, Ch.1 #117, 122, 127(a),(c),(e), 133, 156; and Ch.2 #103, 110. In addition: use the Liapunov or Lindeberg CLT to show that if X_k are independent Expon(1) random variables for all positive integers k, then both of the sums $\sum_{k=1}^n sqrt(k) X_k$ and $ \sum_{m=1}^n \sum_{k=1}^m X_k $ after centering and scaling have nondegenerate asymptotically normal distributions as n goes to infinity. -------------------------------------- PROBLEM SET 2, was originally due Friday, 2/21/14 but NOW DUE in class 2/24/14: Shao text, Ch.2 #114, 117. Ch.3 #108 with the discrete probability mass function replaced by: P(X_1=0) = 1-theta, P(X_1=1) = theta*(1-theta), P(X_1=2) = theta^2. Also do: Shao Ch.3 #109 and Ch.4 #94, 103, 114, 119. EXTRA-CREDIT PROBLEM: show that for any consistent sequence of estimators T_n for theta, there exists a sequence a_n of constants increasing to infinity with n sufficiently slowly that T_n is a_n consistent for theta. -------------------------------------- PROBLEM SET 3, was originally due Friday 3/7/14, but NOW DUE in class 3/10/14. First three problems are posted on the web-page, at link http://www.math.umd.edu/~evs/s701/HW3S14.pdf . Additional problems from Shao text, due 3/7/14: Ch.4 #134, 145, 154, and Ch.5 #95. NOTE that #134 in Ch.4 contains a misprint. The second conditional probability used in specifying the model should be $P(Y_1=1 | P(X_1=1) = \exp(-b\theta)$ . -------------------------------------- PROBLEM SET 4, due Friday 3/28/14. (1) Suppose that i.i.d. random variables X_1, X_2,...,X_n are i.i.d. continuous valued real random random variables. For each j=1,...,n, let r_j = #{i=1,...,n: |X_i| < |X_j|}. (a) Show for i not equal to j that I[X_i+X_j < 0] = I[X_i<0, |X_j| < |X_i|] + I[X_j<0, |X_j| > |X_i|] (b) Use (a) to show that the statistic W_n = \sum_{j=1}^n r_j * I[X_j < 0] is exactly the same as the U-statistic U_n given as the "one-sample U-statistic" in the 3rd displayed equation on p.175 of Shao's book. (2) Do problems # 49, 52 and 54 on p.223 in Chapter 3 of Shao. (3) Do problem #7 on p.347 (Sec.5.6) of Bickel and Doksum, where the minimum contrast estimator is restricted to be taken over the compact subset K of the real line. (4) Consider the problem of estimating the mean \lambda based on Poisson(\lambda) observations Z_1,...,Z_n. (a) Show directly that the Bayes estimator T_n with prior Gamma(2,2) for squared-error loss satisfies the conclusion of the Theorem 4.20 in Shao's book, that is, the posterior distribution is close in the sense of distributional convergence to a normal distribution with mean 0 and asymptotic variance the same as the MLE. (b) What is the order of magnitude (as a power of n) of the bias of T_n ? --------------------------------------------------------- PROBLEM SET 5, due Friday 4/11/14. (1) Show that if X_1,...,X_n is a sample from a continuous distribution function with everywhere increasing distribution function, then rho(X,theta) = (X-theta)*(I[X >= theta] - .75) is a legitimate contrast function for the lower quartile. What additional conditions must be assumed in order that the resulting minimum contrast estimator satisfies the hypotheses of the Theorem covered in class (from the Bickel and Doksum text, along with a slight extension mentioned in the two pages handed out on class) guaranteeing asymptotic normality of minimum contrast estimators ? (2) [counts as 3 problems] Suppose that the random variables X_i are Bernoulli(p) and Y_i given X_i are binary with conditional distribution P(Y_i=1 | X_i) = 1/(1+exp(-a-b X_i)) = plogis(a+bX_i) This is a logistic regression model. (a) Find the ML equations and use the asymptotic theory we derived in Ch.4 (Theorem 4.18) to find expressions for the asymptotic variance. (b) Also find a general expression for the asymptotic variance matrix of a CAN estimator of (a,b) based on the estimating equation sum_{i=1}^n g(X_i) ( Y_i - plogis(a+bX_i) ) where plogis(x) = e^x/(1+e^x). Make reasonable assumptions on the 2-vector function g(x) so that the moments in this estimating equation exist. (c) Using your answer in (b), can you say what would be the best choice for the function g(x) ? (3) Do problems 90 and 91 in Shao's Ch.5, pp.390-391. --------------------------------------------------------- PROBLEM SET 6, due Friday 4/25/14, 5pm. (1) [Counts as 2 Problems] Suppose that we observe iid data pairs (X_i,Y_i) where we believe that X_i ~ N(mu, c^2) and that the conditional distribution of Y_i given X_i is: Y_i ~ Poisson(exp(a+b*X_i)). (a) Derive the ML estimating equation for (a,b), a formula for the asymptotic variance of the MLE's of (a,b), and a consistent estimator for the asymptotic variance, assuming the correctness of this model. Which of these --- your MLE equation, the theoretical formula for the asymptotic variance, and the consistent estimator of the asymptotic variance --- depend on the parameters (mu,c) of the normal distribution of X_i ? (b) Suppose that the assumed Poisson regression model is questionable, but that you still would like to know the variance of the MLE derived in part (a) based on the iid pairs (X_i,Y_i), i=1,...,n. Give a consistent "robustified" variance for the asymptotic variance. Calculate numerically the MLE's and the asymptotic variance estimators in parts (a) and (b) if, for a particular dataset of size n=200, you are given the MLE's ahat = -0.9648 and bhat = 0.3982 and the data summarized in the following form: Group #elts sum of X_i's sum of X_i^2 Y=0 132 -11.085 123.573 Y=1 55 12.168 52.689 Y=2 9 7.658 15.192 Y=3 4 4.749 8.202 (c) What do you think is the reasonable thing to report in this data problem as the confidence interval for a and for b ? (2) Shao Ch.5, problems 61, 87, 101. (3) Shao Ch.6, problem 99, 106. --------------------------------------------------------- PROBLEM SET 7, due Friday 5/9/14, 5pm. (1) [(a) For the general problem of testing the model p_j = g_j(theta) withinthe Multinomial (n_1,...,n_K; p_1,...,p_K) data setting, Section 6.4.3 of Shao's book gives the exact and asymptotic forms of the likelihood ratio test. Give the form of the Wald and Rao Score tests in this setting, explaining what functions would have to be maximized or minimized to obtain the necessary estimators and exact how you would express the Wald and Rao Score test statistics of the null hypothesis of model adequacy in terms of them. Simplify your expressions as much as possible. (b) Apply your Wald and Rao-score formulas to the setting of "independence testing" in Example 6.24, pp.439-440. (2) Rao Ch.6, problem 96, (3) Rao Chapter 7, problems 72, 74 (with the specific kernel h(x_1,x_2) = 0.5*(I[x_1 < 2x_2] + I[x_2 < 2x_1]) , and also 82, 88, and 93. In problem 88, you are asked to prove that two expressions (one of which you must find) are the "same", but you should understand this to mean that the function h cannot depend on n (although Y_i may depend on $\theta$), and the two expressions are allowed to differ by a quantity converging to 0 in probability as n gets large.