Biased Sampling in Surveys and Biostatistics

Meeting 2-3:15pm, Thursdays (regular room MTH1313)                                  Fall 2010

Eric Slud,        Statistics Program , Math Department

Interested participants should email to:

Reading list

Schedule of Talks

RIT Focus:   Biased Sampling generally refers to the statistical analysis of data such that the population
on which we see data differs (in ways which we either know or model) from the target population. This
topic is closely related to the unequal probability sampling strategies in Sample Surveys, and to the still
more unequal probabilities with which selected units in the population respond (i.e., provide data). This
kind of differentially missing data is in turn closely related to notions of `censoring' in biostatistical
studies. Unequal probabilities of sampling in biostatistical contexts arise in connection with `prevalent
cohort' and other epidemiologic cross-sectional sampling strategies. When biostatistical studies have
entry criteria related to the previous occurrence of some symptoms or other biological condition (such
as being `infected' or having a disease advanced to a specified stage), we have biased sampling.

       We will read papers and background texts concerning sampling designs with unequal mechanisms
of selection, unequal probabilities of response, parametric and nonparametric identifiability and analysis
of data. The statistical machinery will involve some discussion of Estimating Equations, semiparametric
statistics, and some histoical discussion on the attempts that have been made to connect survey data to
the Likelihood concept.

Prerequisites:   Participants should have had a course in Mathematical Statistics (at the level of
Stat 700-701 or higher) and some introduction to survey or biostatistical (survival) data.

Topics by Keyword:

  • Biased selection, length-biased-sampling, size-biased sampling
  • Nonparametric estimation of distribution under biased sampling
  • Prevalent cohort, ascertainment bias
  • Nonresponse or missing data mechanism
  • Noninformative (MCAR, MAR) versus informative missng-data mechanism
  • Inverse Probability of Selection Weighting (estimating equatiin idea related
              to Horvitz-Thompson survey estimator
  • Propensity scores in survey sampling.
  • Empirical Likeilihood methods in semiparametric estimation.
  • Superpopulation and `psuedo-likelihood' based estimation in surveys.

    Reading List   (Still under construction)


    Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. eds. (2008) Longitudinal Data Analysis,
           Handbooks of Modern Statistical Methods, Chapman & Hall/CRC.

    Korn, E. and Graubard, B. (1999) Analysis of Health Surveys, Wiley.

    Little, R. and Rubin, D. (2002, 2nd ed.) Statistics of Missing Data, Wiley.

    Tsiatis, A. (2006) Semiparametric Theory and Missing Data (Springer Series in Statistics).

    For a current list of very useful references related to sample survey theory,
    compiled by Mikhail Sverchkov of Bureau of Labor Statistics, click here

    Miscellaneous Papers & Reports

    Addona, V. and Wolfson, DB. (2006). A formal test for the stationarity of the incidence rate using data
           from a prevalent cohort study with follow-up. Lifetime Data Analysis.

    Asgharian, M., Wolfson, DB. and Zhang, X. (2006). Checking stationarity of the incidence rate using
           prevalent cohort survival data. Statistics in Medicine.

    Chen, Jinbo and Norman Breslow (2004), Semiparametric efficient estimation for the auxiliary
           outcome problem with the conditional mean model
    , Canad. Jour. Statist. 32, 1-14. Click here for pdf.

    Gilbert, Peter B. (2000) Large sample theory of maximum likelihood estimates in semiparametric
           biased sampling models.
    Ann. Statist. 28, 151--194.

    Huang Y, Wang MC. (1995), Estimating the occurrence rate for prevalent survival data in competing
           risks models.
    Journal of the American Statistical Association 80,1406-1415.

    Kang, J. and Schafer, J.L. (2007), Demystifying Double Robustness: A Comparison of
           Alternative Strategies for Estimating a Population Mean from Incomplete Data
    , Statist. Sci. 22, 523-539.

    Korn, E. and Graubard, B. (2003) Estimating variance components by using survey data.,
           J. R. Stat. Soc. Ser. B 65, 175--190.

    Mandel, M. and Fluss, R. (2009) Nonparametric estimation of the probability of illness in the
           illness-death model under cross-sectional sampling.
    Biometrika 96, 861-872.

    Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size-biased sampling with applications
           to wildlife populations and human families.
    Biometrics 34 179-189.

    Pfeffermann, D. and Sverchkov, M. work on survey data with semiparametrically modelled
           informative nonresponse.

    Qin, J. (1994ff) Ann. Statist. papers on empirical likelihood.

    Rao, JNK and Wu, C. (2009), Bayesian pseudo-empirical-likelihood intervals for complex surveys,
           J. R. Stat. Soc. Ser. B 72, 533--544.

    Rotnitzky and Robins papers (some with other co-authors) on inverse-probability weighted estimating
           equations for longitudinal studies (eg AIDS) with informative dropout patterns.

    Donald Rubin papers (with P. Rosenbaum and others) on Propensity Scores.

    Yehuda Vardi papers (referenced in Gilbert paper above) on nonparametric estimation of an
           underlying distribution function in a biased-sampling setting.

    Schedule of Talks ---

  • Thursday Sept. 16: Eric Slud introduced the topic and the papers.
  • Thursday Sept. 23: Jin Yan (assisted by Eric Slud) talked about the Vardi 1982
                 Annals of Statistics paper, on nonparametric estimation under length-biased sampling.
  • Thursday Sept. 30: Doug Galagate spoke about a 2004 Biometrika paper by Henmi and Eguchi
                 on "A paradox concerning nuisance parameters and projected estimating functions" which is
                 related to ratio estimation in survey sampling but is primarily about estimating equations.
  • Thursday Oct. 7: Benjamin Kedem spoke about Empirical Likelihood (see the A. Owen book
    home page of the same name) and how to use it in biased sampling problems.
  • Thursday Oct. 14: Paul Smith spoke on biased sampling problems arising in health screening programs.
  • Thursday Oct. 21: Neung Soo Ha presented a 1999 Canadian J. Stat. paper of C. Wu and JNK Rao
                 on empirical likelihoods in survey sampling.
  • Thursday Oct. 28: Vladislav Beresovsky will speak on sample survey weighting when
                 sampling is noninformative (ie not dependent on the measured attribute of interest).

  • Thursday Nov. 4: Julie Gershunskaya will speak on survey sampling methodology within
                 the area of `informative' sampling, using papers of J. Beaumont (2008) and Sverchkov and
                 Pfeffermann (2004). [For precise references, see the bibliography document on
                 Survey Sampling linked within the Reading List above.]

  • NOTE: on Thursday Nov. 4 at 3pm, the Seminar Visitor, Dr. Paul Albert of NIH, will give
                 at the RIT in MTH 1313 a 20-minute presentation on research problems and opportunities
                 for collaboration in his NIH Branch.

                 This presentation will immediately precede Dr. Albert's 3:30pm Statistics Seminar.

  • Thursday Nov. 11: Jiraphan Suntornchost will speak on "Estimating the Occurrence Rate for
                 Prevalent Survival Data in Competing Risks Models.

  • Thursday Nov. 18: Ran Ji will talk about the Mandel and Fluss (2009) paper, on the topic of
                 estimation in the illness-death model from prevalent cohorts.

  • Thursday Dec. 2: Joyce Hsiao will speak on biased sampling topics related to testing the stationarity
                 in time of prevalent cohorts, from papers (listed above) of Addona and Wolfson (2006) and
                 Asgharian, Wolfson, and Zhang (2006).

  • Last updated November 1, 2010.