Biased Sampling RIT Page, Fall 2010

Biased Sampling in Surveys and Biostatistics

Meeting 2-3:15pm, Thursdays (regular room MTH1313) Fall 2010

Eric Slud, Statistics Program , Math Department

Interested participants should email to: evs@math.umd.edu

RIT Focus: Biased Sampling generally refers to the statistical analysis of data such that the population
on which we see data differs (in ways which we either know or model) from the target population. This
topic is closely related to the unequal probability sampling strategies in Sample Surveys, and to the still
more unequal probabilities with which selected units in the population respond (i.e., provide data). This
kind of differentially missing data is in turn closely related to notions of `censoring' in biostatistical
studies. Unequal probabilities of sampling in biostatistical contexts arise in connection with `prevalent
cohort' and other epidemiologic cross-sectional sampling strategies. When biostatistical studies have
entry criteria related to the previous occurrence of some symptoms or other biological condition (such
as being `infected' or having a disease advanced to a specified stage), we have biased sampling.

We will read papers and background texts concerning sampling designs with unequal mechanisms
of selection, unequal probabilities of response, parametric and nonparametric identifiability and analysis
of data. The statistical machinery will involve some discussion of Estimating Equations, semiparametric
statistics, and some histoical discussion on the attempts that have been made to connect survey data to
the Likelihood concept.

Prerequisites: Participants should have had a course in Mathematical Statistics (at the level of
Stat 700-701 or higher) and some introduction to survey or biostatistical (survival) data.

Topics by Keyword:

Biased selection, length-biased-sampling, size-biased sampling

Nonparametric estimation of distribution under biased sampling

Prevalent cohort, ascertainment bias

Nonresponse or missing data mechanism

Noninformative (MCAR, MAR) versus informative missng-data mechanism

Inverse Probability of Selection Weighting (estimating equatiin idea related
to Horvitz-Thompson survey estimator

Propensity scores in survey sampling.

Empirical Likeilihood methods in semiparametric estimation.

Superpopulation and `psuedo-likelihood' based estimation in surveys.

Reading List (Still under construction)

Books

Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. eds. (2008) Longitudinal Data Analysis,
Handbooks of Modern Statistical Methods, Chapman & Hall/CRC.

Korn, E. and Graubard, B. (1999) Analysis of Health Surveys, Wiley.

Little, R. and Rubin, D. (2002, 2nd ed.) Statistics of Missing Data, Wiley.

Tsiatis, A. (2006) Semiparametric Theory and Missing Data (Springer Series in Statistics).

For a current list of very useful references related to sample survey theory,
compiled by Mikhail Sverchkov of Bureau of Labor Statistics, click here.

Miscellaneous Papers & Reports

Addona, V. and Wolfson, DB. (2006). A formal test for the stationarity of the incidence rate using data
from a prevalent cohort study with follow-up. Lifetime Data Analysis.

Asgharian, M., Wolfson, DB. and Zhang, X. (2006). Checking stationarity of the incidence rate using
prevalent cohort survival data. Statistics in Medicine.

Chen, Jinbo and Norman Breslow (2004), Semiparametric efficient estimation for the auxiliary
outcome problem with the conditional mean model, Canad. Jour. Statist. 32, 1-14. Click here for pdf.

Gilbert, Peter B. (2000) Large sample theory of maximum likelihood estimates in semiparametric
biased sampling models. Ann. Statist. 28, 151--194.

Huang Y, Wang MC. (1995), Estimating the occurrence rate for prevalent survival data in competing
risks models. Journal of the American Statistical Association 80,1406-1415.

Kang, J. and Schafer, J.L. (2007), Demystifying Double Robustness: A Comparison of
Alternative Strategies for Estimating a Population Mean from Incomplete Data, Statist. Sci. 22, 523-539.

Korn, E. and Graubard, B. (2003) Estimating variance components by using survey data.,
J. R. Stat. Soc. Ser. B 65, 175--190.

Mandel, M. and Fluss, R. (2009) Nonparametric estimation of the probability of illness in the
illness-death model under cross-sectional sampling. Biometrika 96, 861-872.

Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size-biased sampling with applications
to wildlife populations and human families. Biometrics 34 179-189.

Pfeffermann, D. and Sverchkov, M. work on survey data with semiparametrically modelled
informative nonresponse.

Qin, J. (1994ff) Ann. Statist. papers on empirical likelihood.

Rao, JNK and Wu, C. (2009), Bayesian pseudo-empirical-likelihood intervals for complex surveys,
J. R. Stat. Soc. Ser. B 72, 533--544.

Rotnitzky and Robins papers (some with other co-authors) on inverse-probability weighted estimating
equations for longitudinal studies (eg AIDS) with informative dropout patterns.

Donald Rubin papers (with P. Rosenbaum and others) on Propensity Scores.

Yehuda Vardi papers (referenced in Gilbert paper above) on nonparametric estimation of an
underlying distribution function in a biased-sampling setting.

Schedule of Talks ---

Thursday Sept. 16: Eric Slud introduced the topic and the papers.

Thursday Sept. 23: Jin Yan (assisted by Eric Slud) talked about the Vardi 1982
Annals of Statistics paper, on nonparametric estimation under length-biased sampling.

Thursday Sept. 30: Doug Galagate spoke about a 2004 Biometrika paper by Henmi and Eguchi
on "A paradox concerning nuisance parameters and projected estimating functions" which is
related to ratio estimation in survey sampling but is primarily about estimating equations.

Thursday Oct. 7: Benjamin Kedem spoke about Empirical Likelihood (see the A. Owen book
or home page of the same name) and how to use it in biased sampling problems.

Thursday Oct. 14: Paul Smith spoke on biased sampling problems arising in health screening programs.

Thursday Oct. 21: Neung Soo Ha presented a 1999 Canadian J. Stat. paper of C. Wu and JNK Rao
on empirical likelihoods in survey sampling.

Thursday Oct. 28: Vladislav Beresovsky will speak on sample survey weighting when
sampling is noninformative (ie not dependent on the measured attribute of interest).

Thursday Nov. 4: Julie Gershunskaya will speak on survey sampling methodology within
             the area of `informative' sampling, using papers of J. Beaumont (2008) and Sverchkov and
             Pfeffermann (2004). [For precise references, see the bibliography document on
             Survey Sampling linked within the Reading List above.]

NOTE: on Thursday Nov. 4 at 3pm, the Seminar Visitor, Dr. Paul Albert of NIH, will give
             at the RIT in MTH 1313 a 20-minute presentation on research problems and opportunities
             for collaboration in his NIH Branch.
             This presentation will immediately precede Dr. Albert's 3:30pm Statistics Seminar.

Thursday Nov. 11: Jiraphan Suntornchost will speak on "Estimating the Occurrence Rate for
Prevalent Survival Data in Competing Risks Models.

Thursday Nov. 18: Ran Ji will talk about the Mandel and Fluss (2009) paper, on the topic of
estimation in the illness-death model from prevalent cohorts.

Thursday Dec. 2: Joyce Hsiao will speak on biased sampling topics related to testing the stationarity
in time of prevalent cohorts, from papers (listed above) of Addona and Wolfson (2006) and
Asgharian, Wolfson, and Zhang (2006).

Last updated November 1, 2010.