Statistics 700 Mathematical Statistics I


STUDENTS: Please submit your online evaluation of the course, the text, and the instructor by signing in to the URL for the CoursEvalUM page at https://www.CourseEvalUM.umd.edu.


Fall 2022 MWF 9-9:50am,    MTH0103

In-person class with assignments and additional materials on ELMS

Instructor:Professor Eric SludStatistics Program,  Math Dept.,   Rm 2314, x5-5469,  slud@umd.edu

Office hours: M 1-2, F 10-11 (initially), or email me to make an appointment (can be on Zoom).

Lecture Handouts Statistical Computing Handouts Homework Syllabus

Overview: This course introduces mathematical statistics at a theoretical graduate level, using tools of advanced calculus and basic analysis. The framework is to define families of probability models for observed-data structures and explain the sense in which functions of observed-data random variables can give a good idea of which of those probability models governed a particular dataset. The course objective is to treat diverse statistically interesting models for data in a conceptually unified way; to define mathematical properties that good procedures of statistical inference should have; and to prove that some common procedures have them. Aspects of the theoretical results are illustrated using demonstrations with statistical simulation.

Prerequisite: Stat 410 or equivalent. You should be comfortable (after review) with joint densities, (multivariate, Jacobian) changes of variable, moment generating functions, and conditional expectation, Central Limit Theorem and Law of Large Numbers, and mathematical analysis proofs at the level of Math 410-411.

Required Course Text:   P. Bickel & K. Doksum, Mathematical Statistics, vol.I, 2nd ed., Pearson Prentice Hall, 2007.

Recommended Texts:   (i)   George Casella and Roger Berger Statistical Inference,   2nd ed., Duxbury, 2002.
(ii)   V. Rohatgi and A.K. Saleh, An Introduction to Probability and Statistics, 2nd ed., Wiley, 2001.
(iii)   Jun Shao, Mathematical Statistics, 2nd ed., Springer, 2003.
(iv)   P. Billingsley, Probability and Measure, 2nd (1986) or later edition, Wiley.

A sheet of errata in Casella & Berger compiled by Jeffrey Hart of Texas A&M Stat Dept can be found here.
The most important of these errata are the ones on p.288, in Equation (6.2.7) and in line 14 (the last line
of Thm. 6.2.25): what is important as a sufficient condition for completeness is that the set of "natural parameter"
values   (η1,...,ηk) = (w1(θ),w2(θ),...,wk(θ))   fills out an open set in   Rk   as   θ   runs through all of   Θ.

Course Coverage: STAT 700 and 701 divide roughly with definitions and properties for finite-sample statistics in the Fall (STAT 700), and large-sample limit theory in the Spring (STAT 701). The division is not quite complete, because we will motivate many topics (Point Estimation, Confidence Intervals, identifiability) in terms of the Law of Large Numbers. The coverage in the Bickel & Doksum book for the Fall is roughly Chapters 1-4 along with related reading in Casella & Berger for special topics.
We begin with an overview of statistical data structure, models and formal definition of statistics in Chapter 1 (Secs. 1.1.1-1.1.3.) Succeeding lectures will review standard background material on probability and standard distributions (Appendix A, especially sections A.10-A.14) in order to set up later material on Exponential Families (Section 1.6). Brief review of basic statistical definitions will be done from the viewpoint of Decision Theory in Sectioon 1.3. Introduction of the Bayesian viewpoint on statistical inference is naturally done in that context, and we will cover some of the Bayesian mechanics in Section 1.2. The other important material in chapter 1 concerns the notion of "sufficient statistics" and "prediction" versus "estimation".
Chapter 2 covers the main estimation techniques, (generalized) method of moments, maximum likelihood, and Estimating Equations as a way to unify these two different-seeming methods in a general framework. Computational topics (algorithms, including numerical maximization and EM) for the solution of Maximum Likelihood and Estimating Equation problems are also covered in Chapter 2. Chapter 3 discusses notions of performance quality and optimality for statistical estimation procedures, while Chapter 4 introduces basic ideas and optimality principles related to hypothesis testing.
Readings in Casella and Berger will be occasional and topic-based. Some introductory Bayesian topics will be covered there, and basics on MCMC may also be discussed as part of Chapters 1 and 2 of Bickel and Doksum augmented by pdf handouts.

Lecture Topics by Date.

Grading: There will be graded homework sets roughly every 1.5--2 weeks (6 or 7 altogether); one in-class test, tenatively on Wed., Nov. 2; and an in-class Final Exam. The course grade will be based 45% on homeworks, 20% on the in-class test, and 35% on the Exam.
Homework will generally not be accepted late, and must be handed in as an uploaded pdf on ELMS. (If you scan your handwritten papers or generate them using some other word-processor like Word or LaTeX, then convert them to pdf before uploading.


HONOR CODE

The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This Code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please visit http://www.shc.umd.edu.

To further exhibit your commitment to academic integrity, remember to sign the Honor Pledge on all examinations and assignments:
"I pledge on my honor that I have not given or received any unauthorized assistance on this examination (assignment)."

The guideline for the course on Homeworks is that you may get hints from each other or from me, but that you must write up your solutions completely by yourself, without copying any parts of solutions from each other.


The Fall 2022 Course Syllabus for Stat 700 is linked to this course web-page (and also posted on the ELMS course pages).
Also: messages and updates (such as corrections to errors in stated homework problems or changes in due-dates) will generally be posted here, on this web-page, and also through emails in the course-mail account.

For further information (updated throughout the term) on the timing of individual lectures and tests, click here and see the
Important Dates below. For auxiliary reading in several useful handouts that are described and linked below, click here.


Lecture-Topic Handouts

(I)    You can see sample test problems for the 1st in-class test, along with the Fall 2009 In-Class Test and a set of
sample problems for the in-class final. Also see further sample Problems and Topics for the Fall 2014 1st In-Class Test,
and sample Problems and Topics for the Fall 2014 2nd In-Class Test.
For the in-class test in Fall 2022, here are some Topics and Sample Problems for In-Class Test.
Here are lists of Important Topics and Sample Problems for the Final Exam on Saturday Dec.17.

(II)    Old homework problem assignments in Casella and Berger from Fall 2014 can be found here. You can also see most of the solutions to these problems.

(III)    I have a paper on the topic of distributions related to the normal that are or are not uniquely determined by their moments.
The paper uses many of the techniques we review in Chapters 1 and 3.

(IV)   The topic of mixture distributions and densities and their relation to hierarchical specification of a distributional model and to distribution functions of mixed type is elaborated in this handout. Here is an additional handout specifically on the identifiability of 2-component normal mixtures.

(V)   A paper for a talk I gave at the 2013 Federal Committee on Statistical Methodology includes the idea of following on those policies (in this case, for callback of nonrespondents in the American Community Survey) that are on the admissible frontier with respect to a specified set of loss functions. This idea is related to the discussion of admissibility in SDection 1.3.3 of Bickel and Doksum.

(VI)   A handout on conjugate priors for a class of exponential family densities and probability mass functions.

(VII)   Handout on Cramer-Rao (Information) Inequality to supplement what we did in class and what is done in Bickel and Doksum's Section 3.4.2.

(VIII)   Handout on EM Algorithm from STAT 705.

(IX)   Lecture Notes from Stat 705 on Numerical Maximization of Likelihoods.


Statistical Computing Handouts

(X) Topics on Statistical Simulation: There are two sorts of handouts on Simulation methods and
interpretation. First, under this heading, there are 4 pdf writeups on Random Number Generation,
simulation, and interpretation of simulation experiments: (i) Pseudo-random number generation,
(ii) Transformation of Random Variables,
(iii) Statistical Simulation ,
and if you want to read a little more on computational speedups in statistical simulations, click here.
Topics (i) and (iv) were taken from my
web-pages for the course STAT 705 on Statistical Computing in R. Additional material on
statistical simulation for Bayesian MCMC is discussed under heading (X) below.

(XI) Background on Markov Chain Monte Carlo: First see Introduction and application of MCMC
within an EM estimation problem in random-intercept logistic regression. For additional pdf files of
"Mini-Course" Lectures, including computer-generated figures, see Lec.1 on Metropolis-Hastings Algorithm,
and Lec.2 on the Gibbs Sampler, with Figures that can be found in Mini-Course Figure Folders.



Homework: Assignments, including any changes and hints, will continually be posted here. The most current form of the assignment will be posted also on ELMS. You can find old homework assignments cumulatively added to this text-file and selected problem solutions in the directory HWslns/.

HW 1 due Monday Sept.12, 11:59pm (upload to ELMS)
Read Chapters 1, Sec.1.1 and Appendices A. 10-A.14 and B.7 of Bickel and Doksum. In Bickel and Doksum, do problems # 1.1.1(d), 1.2.(b)-(c), 1.1.15, and B.7.10, along with 3 additional problems:

(A) Suppose that i.i.d. real random variables X1,...,Xn are observed and can be assumed to follow one of the densities   f(x,θ)   from a family with real-valued unknown parameter θ. Suppose that there is a function   r(x)   such that   R(θ) = ∫ r(x) f(x,θ) dx   exists, is finite, and is strictly increasing in   θ.   Show that the parameter   θ   is identifiable from the data.

(B) In the setting of problem (A), explain (as constructively as possible) why there is a consistent (in probability) estimator   gn(X1,...,Xn)   of   θ  .   Hint: Start from   n-11≤j≤n r(Xj) ,   and assume that   R(θ)  is continuous if you have to. An alternative assumption you may use instead is   ∫   r2(x) f(x,θ) dx < ∞   for all θ.

(C) In the setting of i.i.d. vector-valued data Y1,...,Yn   with vector-valued parameter   θ ∈ Θ ⊂ ℝk,   suppose that there exists a consistent (in probability) estimator   gn(Y1,...,Yn)   of   θ.   Then show that   θ   is identifiable from the density family   f(y,θ).

All 7 problems are to be handed in (uploaded) Monday Sept. 12 in ELMS.


HW 2, due Tuesday September 27, 11:59pm (7 problems total)

Read Chapter 1 Sections 1.2-1.3 of Bickel and Doksum and continue to review Appendix B.7.

In Bickel and Doksum, do problems # 1.2.2, 1.2.8, 1.2.12, 1.3.2, 1.3.3, 1.3.4(a) plus one additional problem:

(D) (a) Show that if a random K-vector v=(v1,...,vK)   is Dirichlet(α)   distributed, then   v1 ~ Beta(α1, α2+...+αK).
     (b). Suppose that in 100 multinomial trials with 3 outcome categories and unknown category probabilities  (p1, p2, p3)   you observe respectively 37, 42, 21 outcomes in category 1, 2, 3. Assume that the prior density for the unknown   (p1, p2) is   proportional to   p1 * p2,   and find the prior and posterior probability that   p3 > 0.3.
Hint: the probabilities in (b) are cdf's for the Beta distribution, also called incomplete Beta integrals (which you must multiply by a Beta function value). You can get them either from Tables (not so easy to find these days) or by a one-line invocation to the Beta distribution function pbeta in R or a similarly named function in your favorite computing language (Matlab, basic, python, ...)


HW 3, due Wednesday October 12, 11:59pm (7 problems total)

Read Chapter 1 Sections 1.4, 1.5 and 1.6.1 of Bickel and Doksum.

In Bickel and Doksum, do problems # 1.4.4, 1.4.12, 1.4.24, 1.5.4, 1.5.5, 1.5.14, 1.5.16 (and in 1.5.16, prove minimality).

For #1.4.4, to say Z is of "no value" in predicting Y would mean that   P(Y ≥ t | Z)   is free of Z for all t, or equivalently that Y is independent of Z. To solve 1.4.4,
(a) Prove that   sign(U1),   U12 / (U12 + U22)  and   U12 + U22 are jointly independent random variables; and
(b) Show that the best predictor of Y = U1 with respect to mean-square or absolute error loss is 0, but also find a loss function for which the best predictor of Y is a nontrivial function of U1.


HW 4, due Saturday October 29, 11:59pm (8 problems total)

Read Chapter 1 Section 1.6 of Bickel and Doksum thoroughly. Also look at Sections 3.2-3.3 which will round out our coverage of decision theory before the in-class test on November 2.

In Bickel and Doksum, do the following problems from Bickel and Doksum pp.87-95: # 1.6.2, 1.6.10, 1.6.17, 1.6.28, and 1.6.35. Then also do and hand in the following 3 problems:

(E) For a Poisson(λ) sample find the UMVUE (Uniformly Minimum Variance Unbiased Estimator) of eλ/2.

(F) For a Poisson(λ) sample X1, ..., Xn with prior π(λ) ~ Gamma(3,1) for the parameter λ, find the Bayes estimator of eλ/2 with respect to mean-squared error loss, and show that the mean-squared errors of both of the estimators found in (E) and (F) (in a frequentist sense, not using the prior) are of order 1/n and differ from each other by an amount of order 1/n2.

(G) Suppose that the sample X1, ..., Xn of nonnegative-integer observations have the probability mass function p(k,θ) = θk (1-θ) I[k ≥0] for unknown parameter θ > 0. Find the UMVUE's of 1/(1-θ) and of θ based on the data sample of size n. Hint: finding an unbiased estimator of each of these functions of θ as a function of a single observation X1 is a matter of identifying the coefficients of a power series in θ. Use the result of Bickel & Doksum problem 1.6.3 to do the conditional expectation calculation you need in this problem.


HW 5, due Friday 11/18/22 11:59pm (7 Problems)

Reading: Chapter 2 through Section 2.3, also Sections 2.4.2-2.4.3 and 3.4.2.

Do problems 2.2.11(b) (counts as 1/2 problem), 2.2.12, 2.2.21, 3.4.11 (counts as 1.5 problems), and 3.4.12, plus the following two extra problems:

(H) Let X1, ..., Xn be an iid sample from   N(μ,1)   and ψ(μ) = μ2. (a) Show that the minimum variance for any estimator of μ2 from this sample, according to the Cramer-Rao inequality, is 4 μ2/n. (b) Show that the UMVUE of μ2 is X̄2 - 1/n and that its variance is 4 μ2/n + 2/n2.

(I) Find by direct calculation the likelihood equation solved uniquely by the MLE of α based on a Gamma(α, 2) sample W1,...,Wn, and also show by direct calculation that this is the same equation satisfied by the method of moments estimator of α. Why does this follow from Exponential-Family theory ?


HW 6, due Monday 12/12/22 11:59pm (7.5 Problems)

Reading: Chapter 4 through Section 4.5.

In the Bickel & Doksum problems for Chapter 4, do 4.1.12 (counts as 1.5 problems), 4.2.2, 4.3.5, 4.3.7, 4.3.8, 4.3.10, 4.4.6.


HW7 is now cancelled. We will do these problems as part of STAT 701 in the Spring term.

Read Sections 4.5 and 4.9, and do Bickel and Doksum problems 4.4.14 and 4.5.3.



Important Dates

Return to my home page.

© Eric V Slud, Jan. 27, 2023.