Stat 730 Home-Page

Course Evaluation Note

        Statistics 730   Time Series Analysis

 Spring 2017                                  MW 5-6:15, Mth 1313

Instructor: Eric Slud, Statistics Program, Math. Dept.    Office:  Mth 2314, x5-5469,     email evs@math.umd.edu

For a set of Sample Problems for the In-Class Test, click here.

Course Text: R. Shumway & D. Stoffer, Time Series Analysis and its Applications, 2nd ed. 2006, Springer.
(This text is free as an e-book to UMCP students through the library: see website with datasets and errata.)

Recommended text:H. Lutkepohl, New Introduction to Multiple Time Series Analysis, 2005, Springer. (Also free as e-book.)

Overview: This course covers the concepts and tools of statistical time series analysis, both from a mathematical 
and a data-analytic viewpoint. Course segments on mathematical tools will be interleaved with segments 
emphasizing model-building, statistical analysis (in R), and simulation. The course introduces methods both 
in the time and frequency domains. The mathematical theorems and proofs are an essential part of the course. 
Students will be required to make further mathematical arguments and extensions in graded homework problems, 
and understanding of the conditions under which the techniques are valid will be tested.

Prerequisite: Stat 700 plus a graduate course in mathematical analysis, plus some computing familiarity.

Course requirements and Grading: there will be 7 or 8 graded homework sets (one every 1½ to 2 weeks) 
which together will count 40% of the course grade. There will also be an in-class test and a final course 
project (or take-home test), each of which will count as 30% of the course grade.

NOTE ON USE OF THEORETICAL MATERIAL.  Both in homeworks and the in-class test, there will be 
theoretical material involving probability theory as needed to apply the law of large numbers and central limit 
theorem, along with the `delta method' (Taylor linearization), linear algebra and other manipulations at advanced-
calculus level, in some cases verging on measure-theoretic probability techniques. (Look at Appendix A of the 
Shumway-Stoffer book to see what I mean). There will also be some use of Hilbert space methods. The 
theoretical material in the Shumway and Stoffer book is concentrated in the Appendices, but that material will be 
supplemented in class.

Course Coverage: Chapters 1-5 and Appendixes A, B, C of the Shumway and Stoffer book, plus material 
from Chapters 6-7 as time permits.

NOTE ON COMPUTING.  Both in the homework-sets and the course project, you will be required to do 
computations on real datasets using a statistical-computing platform such as R or SAS or MATLAB. The book 
and various class demonstrations and scripts on this web-page will be given in R, and that is the only software 
platform that I will use or provide help with. If you are learning one of these packages for the first time, I strongly 
recommend R, and I will provide links to free online materials introducing them. In addition, there is a concise 
 introduction to R commands in time series analysis  that you should consult.

COMPUTER ACCOUNTS.  Math, Stat, and AMSC graduate students have access to R, MATLAB and SAS 
through their Mathnet and glue accounts.  R is freely available in Unix or PC form through this link.

Getting Started with R

Note: In this course, the book and I will make many references to the R language and statistical programming platform. This is a free software package. If you are new to R, you should get started as soon as possible, using it either on your university Glue account in a Linux setting, or on a workstation or PC, either at the University or on your home computer by downloading the software following instructions at the R website. For the systematic Introduction to R and R reference manual distributed with the R software, either download from the R website or simply invoke the command

> help.start()

from within R. For a quick start, see my own Rbasics handout originally intended for a Survival Analysis class, and then read more about R objects and syntax in the Venables and Ripley text, in my Stat 705 Lecture Notes, and in the R introduction manual distributed with the R software. A really useful short summary of a lot of R commands can be found here. See also the previously mentioned concise introduction to R commands in time series analysis .

R Logs

For R practice logs that will periodically illustrate R commands related to time series
data and exercises, see RLogsS730 Directory.

Assignment 1. (First 1½ weeks of course, HW with 7 Problems due Mon., Feb. 6). 
          Read Chapter 1 through Sec. 1.5, plus Section A.1 (Appendix A). 
          Solve problems #1.3, 1.8, 1.9, 1.13, 1.15, 1.16(b), plus one more problem, below. 
In #1.3, you may use R commands as in Example 1.10 as the 
problem suggests, or you may code the generation of the time-series variables directly. 
Extra Problem, not in text: (i) Prove that if   X(t)  is a stochastic process with finite second moments 
for integer indices t, and for each n ≥ 1,     X_n(t)   is a strictly stationary process in t,   also with finite 
second moments, such that   E(X_n(t)-X(t))² → 0   as n → ∞,   then X(t) is strictly stationary. 
(ii) Prove the same assertion as in (i) with "strictly stationary" replaced by "weakly stationary". 

Assignment 2. (Second 1½ weeks of course, HW with 7 Problems due Wed., Feb. 15). 
           Read Chapter 2 through Example 2.7, plus Section A.2 and B.1 (AppendicesA,B). 
           Solve problems #2.2, 2.4, 2.5, 2.6, 2.8 (counts as 2 problems), plus one more problem, below. 
 Extra Problem, not in text: Suppose that w_t for all integer t is a (0,σ²) White Noise, and that 
X_t = ∑_j= -2,2 w_t-j ,    Y_t=X_t-1+X_t+X_t+1. (i) Derive   γ_{_X}(h)   and   γ_{_Y}(h).   (ii) Express Y_t as a Moving 
Average of w_t.   (iii) Prove that w_t cannot be expressed as a finite-order moving average of X_t.

Assignment 3. (Weeks 4-5 of course, HW with 7 Problems due Wed., Mar. 1). 
           Read Chapter 3 through Section 3.4, plus Section B.2 and B.4 (Appendix B). 
           Solve problems #3.2 (counts as 2 problems), 3.3, 3.6, 3.8,  plus two more problems, below. 
 Extra Problems, not in text: (I). Suppose that (X_t, t=1,2,3,4)   are jointly Gaussian mean 0 with  
4x4  covariance matrix  Σ_j,k = r(j-k) where  r(0)=2, r(1)=r(-1) = 1, r(2)=r(-2) = 0.5, r(3) = r(-3) = 0. 
(a) Find the partial correlation of X₁ and X₃ (given X₂). 
(b) Find the partial correlation of X₁ and X₄ (given X₂,  X₃). 
(II). Show that in order for the AR(2) with autoregressive polynomial φ(z) = 1 - c₁ z - c₂ z² 
to be causal, the parameters (c₁, c₂) must lie in the region of pairs 
such that c₁+c₂<1,   c₂-c₁ < 1,   and |c₂| < 1. 
Are these conditions sufficient for causality ? 

Assignment 4. (Weeks 6-7 of course, HW with 7 Problems due Fri., Mar. 17 6pm). 
           Read Chapter 3, Sections 3.4 through 3.8. 
           Solve problems 3.11, 3.12 (proof of P3.4 only), 3.15, 3.17, 3.23, 3.27, plus one more. 
           Extra Problem Suppose that X, Y are scalar r.v.'s and Z a p-vector variable, and denote the 
covariance matrix (assumed finite) of the p+2 dimensional random vector (X,Y,Z) as B. Show that 
the partial correlation of X,Y given Z is 0 if and only if the (1,2) entry of B^-1 is 0.

Assignment 5. (Weeks 8-9 of course, HW with 7 Problems due Wed. April 5 in-class). 
           Read Chapter 4, Sections 4.1-4.8. 
           Solve problems 3.40 (Hint: project onto the space spanned by   {w_j-w₀ ,   j=1,...,n}.) , 
                 plus 4.4, 4.5, 4.6, 4.10, 4.13, 4.20.

Assignment 6. (Weeks 10-11 of course, HW with 9 Problems due Mon. May 1). 
           Read Chapter 4, Sections 4.5-4.8, 4.10, 4.11. 
           Solve problems  4.8, 4.23, 4.25, 4.28 plus 5 more, immediately following.

 (I).(Counts as 2 problems.) (a) Simulate a long (n ≥ 1000) time series with the stationary 
ARMA(1,2) model   X_t - 0.3 X_t-1 = (1-0.5B)(1-0.2B)W_t ,   with   W_t   standard normal. 
Verify that your estimates of the parameters   γ(0)   and   γ(1)   agree reasonably closely 
 with the theoretically correct values of these parameters.
        (b) Find an analytical expression for the spectral density of the   X_t   process, and plot it in 
a suitably labeled graph.
       (c) Overplot on the same graph (with a different line-type or color) a smoothed periodogram 
estimator (with no tapering) based on a Daniell kernel with L=21 points (each with weight 1/21). 
        (d) Also overplot on the same graph (again with a different line-type or color another 
smoothed periodogram estimator of the spectral density which gives greater weight to 
periodogram ordinates near the center of the lag window consisting of 21 points, specifying 
what kernel you used and how you implemented it in the software you used. 
       (e) Make sure in your solution to part (d) that your scaling of the spectral density and 
periodogram are such that the smoothed periodograms are reasonably close to the true spectral density.

 (II). Simulate a long (n ≥ 1000) stationary time series with spectral density very close to 
f(x) = (1 - (x/π)²)   for   -π < x ≤ π. You can find a two-sided MA process   ∑_{j: -a <j≤b}  c_j W_t-j 
with large positive  a,b to accomplish this. Overplot a graph of this spectral density  f  with a 
smoothed periodogram estimate of the spectral density to show that you did this correctly (and say 
what lag-window smoother you used, and show the computer code that generated your picture).

(III).(Counts as 2 problems.) Simulate a pair of long, dependent, stationary time series X_t, Y_t 
(t=1,...,n,   n ≥ 1000) with the model X_t - 0.9 X_t-1 = W_t   and Y_t = 0.5 X_t-3 + 0.5 V_t , where 
W_t and V_t are independent white-noise sequences with Uniform[-1,1] distribution. 
       (a) Find the theoretical form for the cross-covariance   γ_YX(h), and show that the form you 
find is reproduced in a plot of the estimated cross-covariance from your simulated pair of 
time-series. 
       (b) Find the theoretical form of the cross-spectral density and 
coherence of X_t and Y_t.

Assignment 7.  Applied Data Analysis HW set, will be due Friday, May 12.
Note that 2 problems have been deleted (because previously assigned) and one substituted: 
just like number (II) from HW6 -- see HW6Notes in Rlogs for method.)
           Read Sections 2.3, 3.7-3.9, 4.10, 5.3, 5.5, 5.6, 6.1 and 6.2. 
           Do problems Problems: 3.31, 3.32, plus 3 more, immediately following.

 (A). Consider the SOI series, which we found to have several prominent autocorrelations at 
lags k*12, filtered by the seasonal detrending operator  1-B¹².  

 (i) Show that this series has two spectral peaks, when the periodogram is only very slightly 
smoothed. Do you think they are both real ? Try to smooth the periodogram with lag windows 
weighting more heavily toward the center of the window.

 (ii) Follow the stepwise stochastic linear regression steps we previously used for the original SOI 
series on this filtered series. Do you find that the residuals from your fitted models now pass the 
Box test for model adequacy ? 

 (iii) If not, explain which lags in the residuals contributed most heavily to your Ljung-Box statistic.

 (B). Using the smoothed bivariate periodogram     tmp = spec.pgram(SOI.Rec, kernel("daniell",4), taper=0)

as in the R Log  TSAdataAnalysis.txt covered in class, find by inverse FFT the weights for the 
optimal linear filter  approximating  Rec[t]  by  ∑_j b_j * SOI[t-j].

 (C). Simulate a long (n ≥ 1000) stationary time series with spectral density very close to   f(x) = 1 
for   -π/2 < x ≤ π/2   and   = 0   for   x ≤ -π/2 and x > π/2. You can find a one-sided MA process 
 ∑_{j: 0 ≤j≤b}  c_j W_t-j with large positive b to accomplish this. Overplot a graph of this spectral density  f 
with a smoothed periodogram estimate of the spectral density to show that you did this correctly (and 
say what lag-window smoother you used, and show the computer code that generated your picture).

SYLLABUS for Stat 730

I. Definitions and Constructions of Time Series Models. (2 weeks, Ch. 1 & Appendix A)
          A. White Noise AR, MA, Random Sinusoids
                   i. R basics and time series commands
          B. Autocovariance and autocorrelation functions.
          C. Strong and Weak Stationarity
          D. Review of Multivariate Normal, Convergence of RVs and Distributions, and Limit Theorems (leading to Thm A.2).

II. Exploratory Data Analysis for Time Series. (2 weeks, Ch. 2)
          A. Regression and ANOVA (Gaussian case)
                   i. Information Criteria and Model Building
                   ii. Differencing
          B. Autocorrelation and Spectrum Estimation (Periodogram)
          C. Kernel and Spline Smoothing

III. Autoregressive Integrated Moving Average (ARIMA) Models. (4½ weeks, Ch. 3 & Appendices A,B)
          A. Definitions, Relation to Difference Eq'ns
                   i. Autocorrelation and Partial Autocorrelation
                   ii. Prediction; Nonstationary Models
          B. Estimation, Model-building
          C. Decomposition into Signal, Noise, and Seasonal Components

IV. Spectral (Fourier) Analysis & Periodogram. (4 weeks, Ch. 4 & Appendix C)
          A. Filtered Series, Periodogram & Discrete Fourier Transform
          B. Nonparametric vs. Parametric Spectral Estimation
          C. Fourier Analysis vs. Wavelets
          D. Estimation, Prediction, & Filtering
          E. Extensions to Multiple (Vector) Time Series

V. Miscellaneous Topics. (2-3 weeks, Ch. 5 & 6)
          A. GARCH, Long-memory and ARMAX Models
          B. State Space Models & Methods
          C. Likelihoods in Time-Domain and Spectral Forms, Maximum Likelihood, Missing Data, Structural Models

Project Ideas -- for a list of Project paper Guidelines, click here.
Suggestions for ideas and papers which might be used as the basis for a final report or project will be added
here from time to time. The Final Project will be due by 5pm Fri., May 19.

(1) Time series methods are sometimes used in connection with repeatedly collected survey data. Two technical
reports that provide good exposition of how sample survey theory and time series ideas combine are
Bell & Hillmer 1987 and Bell & Hillmer 1989, and there are many later references to sample-survey data with
a history of using time-series methods, such as the Current Population Survey monthly employment numbers.

(2) Various kinds of signals or trends are identified and removed from time series in order to identify the
stationary-residual structure and forecast on the basis of it. This approach is especially prominent in econometric
time series, under the heading of "seasonal adjustment" -- the idea is to separate longer-term trends and aspects
of the business cycle from the stationary time series residuals. One of the papers that started all this off is
          Cleveland, W., Tiao, G. (1976). Decomposition of seasonal time series: A model for the Census X-11
                    program. Jour. Amer. Statist. Assoc. 71:581–587.

(3) A recent review paper surveying techniques of trend removal and analysis of the residuals is Alexandrov et al. 2012.

(4) Another possible topic is the careful choice of lag windows and spectral windows for their specific properties,
which is covered in many well-known books and papers, and also in recent papers emphasizing specific methods
for the choice of good smoothers, e.g.
P. Stoica and T. Sundin (1999) Optimally Smoothed Periodogram, Signal Processing Volume 78(3), pp. 253–264,
http://doi.org/10.1016/S0165-1684(99)00066-3.

(5) One source of nonstationarity for time series is a single-time occurrence (like a change in measuring instrument,
or a war or market-crash) that causes a dislocation of a previously stationary series in a way that decays
over further time and can be modeled. A famous and seminal paper on this idea is
          Box, G. and Tiao, G. (1976), Intervention analysis with application to economic and environmental
                   problems, Jour. Amer. Statist. Assoc. vol.70, pp.70-79.

(6) Shumway and Stoffer briefly discuss the assessment of goodness of fit of stationary time series models with
the Box-Ljung-Pierce Q statistic. The Box-Ljung and Pierce papers or a chapter on this topic in some other time
series book could form a very good topic for an expository term project, possibly augmented with real or
simulated-data examples.

(7) Bootstrapping of time series is somewhat different from other bootstrapping applications you may have seen.
There are parametric-bootstrapping methods (which require specififying the White-Noise error distribution, or
methods based on bootstrapping residuals from fitted models (which do not require specifying error distributions),
or nonparametric methods involving bootstrapping of blocks. There are various papers you might use, especially
one of Politis-Paparoditis cited in Shumway and Stoffer.

Additional Computing Resources. There are many publicly available datasets for practice data-analyses.
Many of them are taken from journal articles and/or textbooks and documented or interpreted. A good
place to start is Statlib. Datasets needed in the course will be either be posted to the course web-page,
or indicated by links which will be provided here.

To begin, here are a few time-series websites:
Time Series Data Library
Economic Indicators and Time Series (BLS)
What is a Time Series?

The Campus Course Evaluation Website https://www.courseEvalum.umd.edu is open through May 12 for you to submit your evaluation of this course. Please take this opportunity to evaluate me and the course during this period !

CourseEvalUM main page: https://www.CourseEvalUM.umd.edu (top button)

Important Dates

First Class: Wed., January 25, 2017

Spring Break: Mon., Mar. 20 -- Fri., Mar. 24, NO CLASS

Mon., April 10, 2017: In-class test

Last Class: Wed., May 10, 2017

Term Project Due: Fri., May 19, 2017 by 5pm.