STAT/SURV 440:  SAMPLING THEORY

FALL  2010                     MW 5-6:15pm, MTH 0101

For the Fall 2010 Final Exam, click here. It is a take-home,
with no restrictions on the material you can use while you take it,
and must be handed in by 5pm, Friday, Dec. 17 at the latest.

For the current Homework Assignment, click here.

Instructor: Eric V. Slud, Statistics Program

See below for:
Sample Problems for Fall 2007 Exam
Take-Home Makeup/Extra-Credit for In-Class Test

Getting Started in R
Course topics
Requirements, Grading & Policies
Course handouts
Homework Assignments
Other links

Course text:  Wayne Fuller (2009),  Sampling Statistics, Wiley.

Recommended:  Lohr, S. L. (1999).  Sampling: Design and Analysis,  
       Pacific Grove, CA: Duxbury.    There is also a second (2009) edition.

Prerequisite:  A semester of statistics at the level of STAT 401 or 420.

Course Description:

Sampling refers to the statistical techniques used in political polls, marketing surveys, federal data gathering
and many areas of social science and public health.

This course provides an introduction to methods of sampling and analyzing data from finite populations from
both a theoretical and applied perspective. It is intended for Statistics and Mathematics students interested in
applications and for students in the Applied Statistics track of the Survey Methodology program, as well as
students in disciplines such as business, life science or social science who need sampling in their research.

The Fuller text emphasizes both mathematical theory and real data applications, especially those with a regression
flavor. The recommended Lohr text is easier reading, with many simpler applications. The course material
requires that you understand basic statistical concepts such as point estimation, confidence limits, and the
central limit theorem. More advanced theoretical topics in Fuller's book will be covered, gently, emphasizing
statements (and in some cases, alternative versions) and interpretations of theorems rather than proofs.

STAT 440 is part of the required material for the MATH/STAT/AMSC
MA and PhD Written Examinations in Applied Statistics.


Topics:       for departmental syllabus click here

Coverage in Fuller's book:   Chapter 1 (with lighter coverage of theory in Sec. 1.3),
       Chapters 2-3 (All), and selections from Chapter 4 (Secs. 4.3-4.4) and 5 (Sec.5.1).

Coverage in Lohr's book:   Chapters 1--8 plus topics from Chapters 9 and 11.

References:

Cochran, W. J. (1977).  Sampling Techniques  (3rd. ed.). New York: J. Wiley.

Sarndal, C.-E., Swensson, B., and Wretman, J. (1992).  Model Assisted Survey
       Sampling.
  New York: Springer.

Course Requirements and Grading:

There will be an in-class midterm and a final exam on Thursday, Dec. 16 from 4--6 p.m.
There will be frequent homework assignments, 7--8 in all, including both theoretical and
applied problems. Grades will be based on the midterm (25%), homework (40%), and
the in-class final exam (35%)


Course Policies:

(i) As part of the applied homework assignments, students will be expected to do arithmetic
calculations on the computer, which will sometimes involve a small amount of programming.
Students may choose the language or platform, which may range from Spreadsheets to SAS to
R or Splus. However, all computational illustrations in the course and all computer help
offered in an office-hour setting will be restricted to R.


For the systematic Introduction to R and R reference manual distributed with the R software,
either download from the R website or simply invoke the command

> help.start()

from within R. For slightly less extensive introductory tutorials in R, click CUNY or Illinois State.

(ii) Late homework will be accepted late, but grade will always be reduced.

(iii) All homeworks for students taking the course on campus should be handed in as
hard-copy on or before the due date.

Homework Assignments.

Homework solutions including numerical answers, some discussion,and R scripts, can be found here.

HW1 due Mon. Sep. 13:   Fuller   (pp. 76-81)   Exercises Sec 1.6: #2, 4, 9, 12, 13, 14.

HW2 due Wed. Oct. 6:   Lohr first edition, Exercises: #1, 6, 12 in Sec. 2.10 (pp.50 ff.) and
              5, 6 in Sec.3.6 (pp.88 ff) and 4.9.5, pp. 120-1. Also: do the problem assigned in class,
              verifying that the first general unbiased estimator for the Horvitz-Thompson estimator in SRS
              n out of N sampling agrees precisely with the Sen-Yates-Grundy estimator.

HW3 due Mon. Dec. 6.:   Lohr first edition, Exercises: #4, 9, 12, 15 in Sec. 5.9 (pp.170 ff.)
             and Fuller Chapter 2, problems #7, 10, pp.168-170.

HW4 due Mon. Nov. 24:   Lohr first edition, do the following 5 Exercises.
Chapter 7, #9 and 16 pp.251-2;   and Chapter 8, #2, 8;
Plus one additional problem assigned in class:

(I) Suppose that   n   units from a frame population   U   are sampled according to
some probability design   π(s)   with single inclusion probabilities   πi   which can be assumed
uniformly bounded between 0.1 and 0.3 (the exact bounds do not matter much), and that individuals
respond independently of each other and the sample choice, all with the same (unknown)
probability   P(ri=1) = a. Suppose that the adjustment factor for weights is either   (&Sigmai ε S   ri/&pii)/N   or
  &Sigmai ε S   ri/&pii  /  &Sigmai ε S   1/πi.   Which of these choices has smaller variance ? Justify your answer.

For the remaining reading in the course, look next at the Lohr sections on Variance Estimation.

A second problem that I said would be included in this HW will instead become part of the
Take-Home Exam, to be handed in Thursday December 16, 4pm.

(II) Variance of survey estimator in a hierarchical design.
A complex survey is designed as follows, based on a hierarchically structured frame population. The population
is arranged geographically in 50 counties, with known populations   Nc,   c=1,...,50, of which 8 are selected
randomly with replacemement and with probability proportional to size. Within the selected counties, two
different plans are followed: for counties 1..25, a stratified simple random sample of 50 individuals from each of
two strata (Men and Women) is taken, while for counties 26..50 a simple random sample of 100 individuals is taken.
The attribute   yi   in this problem is personal income (measured in units of $10,000).
Given the following background and summary data(attached in linked text file) from the survey,

(a) Find the survey estimator of the total income in the population, combining the with-replacement PPS formulas
at the county level with the SRS and stratified-sample Horvitz-Thompson estimators within-county, and

(b) find an unbiased estimate for the variance of your estimator.


Course Handouts including HW Solutions:

  • Handout on PPS Sampling Using R.

  • Handout on Probability Sampling using R based on Example in Chapter 2.

  • R Script and Directory of Pictures for Classroom Demo on CLT for SRS Sampling.

  • Handout on Ratio Estimation in the Mu281 Dataset of Sarndal et al., which is
    just the Mu284 dataset linked below omitting the records for the three largest cities
    (numbers 16, 114 137).

  • Handout illustrating Ratio and Regression estimation on the dataset counties.dat
    from the Lohr text, consisting of summary variables from a SRS of n=100 out of the
    N=3141 US Counties.

  • Handout illustrating Regression versus Stratified-Sample estimation on
    a simulated dataset with binary attributes and 5 strata.

  • Handout illustrating Regression Estimation within a Stratified Design
    and comparison with other estimators.

  • Handout on Stratified-Sample Estimation relating to Example 4.3, Table 4.2
    in the Lohr book.

  • Handout on biased estimation of Variance in Two-Stage Cluster estimation.

  • Sample Problems for In-Class Test .

  • Fall 2005 In-Class Test and Solutions to Fall 2005 In-Class Test .

  • If you want to see a brief R script for doing the raking example covered in
    the book (Sec. 8.5.2.2), click here .

  • Click here to see Sample Problems for the Stat 440 In-Class Final Exam, along
    with brief Solutions (to all problems except 9b and 10). Another pdf
    handout of sample problems for the Exam can be found here: its solutions can be
    found at the end of the same file of Sample Final problem solutions.
  • Datasets

    Mu 284 Dataset of Sarndal et al,
    "The MU284 Population" from Appendix B of the book "Model Assisted Survey
    Sampling" by Sarndal, Swensson and Wretman.

    Boston housing price dataset used for some exercises/demos.


    Important Dates

    • First Class: August 30 (Mon.)
    • Labor Day: September 6 (Mon., No Class).
    • Mid-Term Exam: In class.
    • Thanksgiving: November 25
    • Review Session: in regular classroom.
    • Final Exam: hand in by Friday December 17, 5pm (at my office).
    Back to top.


    Other Links

    R web-site   from which you can freely download R software (very
    similar to Splus) including miscellaneous packages and datasets.

    StatLib dataset and software archives

    ASA Information on "What is a Survey?"

    Return to Eric Slud home page.

    Main departmental page.

    Statistics Program page.

    Campus Statistics Consortium page.

    © Eric V Slud, December 13, 2010.