DEPARTMENT OF MATHEMATICS
Math Home > Statistics > [ Search | Contact | Help! ]

Old Statistics Seminar Schedules

Contents

Seminar Talks 2008-2009

Fall 2008 Talks

(Fall 2008, Seminar No. 1)

SPEAKER: Prof. Hosam M. Mahmoud
The George Washington University,
Washington, D.C. 20052, U.S.A.

TITLE: The Polya Process and Applications

TIME AND PLACE:  Thursday, September 18, 2008, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: We investigate the Polya process, which underlies an urn of white and blue balls growing in real time. A partial differential equation governs the evolution of the process. Some special cases are amenable to exact and asymptotic solution: they include the (forward or backward) diagonal processes, and the Ehrenfest process.

Applications of standard (discrete) urns and their analogue when embedded in real time include several classes of random trees that have applications in computer science, epidemiology and philology. We shall present some of these applications.




TIME AND PLACE:  Thursday, September 12, 2008, 3:30pm
            

NO Talk: AMSC celebration.



(Seminar No. 2)

SPEAKER: Anastasia Voulgaraki, M.Sc.
University of Maryland
College Park, MD 20742, U.S.A.

TITLE: Estimation of Death Rates in US States With Small Subpopulations

TIME AND PLACE:  Thursday, October 2, 2008, 3:30pm
            Room 1313, Math Bldg

ABSTRACT: The National Center for Health Statistics (NCHS) uses observed mortality data to publish race-gender specific life tables for individual states decennially. At ages over 85 years, the reliability of death rates based on these data is compromised to some extent by age misreporting. The eight-parameter Heligman-Pollard parametric model is then used to smooth the data and obtain estimates/extrapolation of mortality rates for advanced ages. In States with small sub-populations the observed mortality rates are often zero, particularly among young ages. The presence of zero death rates makes the fitting of the Heligman-Pollard model dificult and at times outright impossible. In addition, since death rates are reported on a log scale, zero mortality rates are problematic. To overcome observed zero death rates, appropriate probability models are used. Using these models, observed zero mortality rates are replaced by the corresponding expected values. This enables using logarithmic transformations, and the fitting of the Heligman-Pollard model to produce mortality estimates for ages 0-130 years.



(Seminar No. 3)

SPEAKER: Prof. Ali Arab
Georgetown University
Washington, D.C. 20057, U.S.A.

TITLE: Efficient Parameterization of PDE-Based Dynamics for Spatio-Temporal Processes

TIME AND PLACE:  Thursday, October 16, 2008, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: Spatio-temporal dynamical processes in the physical and environmental sciences are often described by partial differential equations (PDEs). The inherent complexity of such processes due to high- dimensionality and multiple scales of spatial and temporal variability is often intensified by characteristics such as sparsity of data, complicated boundaries and irregular geometrical spatial domains, among others. In addition, uncertainties in the appropriateness of any given PDE for a real-world process, as well as uncertainties in the parameters associated with the PDEs are typically present. These issues necessitate the incorporation of efficient parameterizations of spatio-temporal models that are capable of addressing such characteristics. A hierarchical Bayesian model characterized by the PDE-based dynamics for spatio-temporal processes based on their Galerkin finite element method (FEM) representations is developed and discussed. As an example, spatio-temporal models based on advection-diffusion processes are considered. Finally, an application of the hierarchical Bayesian modeling approach is presented which considers the analysis of tracking data obtained from DST (data storage devices) sensors to mimic the pre-spawning upstream migration process of the declining shovelnose sturgeon.



(Seminar No. 4)

SPEAKER: Prof. Sandra Cerrai
University of Maryland
College Park, MD 20742, U.S.A.

TITLE: A central limit theorem for some reaction-diffusion equations with fast oscillating perturbation

TIME AND PLACE:  Thursday, October 23, 2008, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: We study the normalized difference between the solution $u_\e$ of a reaction-diffusion equation in a bounded interval $[0,L]$ perturbed by a fast oscillating term, arising as the solution of a stochastic reaction-diffusion equation with a strong mixing behavior, and the solution $\bar{u}$ of the corresponding averaged equation. We assume the smoothness of the reaction coefficient and we prove that a central limit type theorem holds. Namely, we show that the normalized difference $(u_\e-\bar{u})/\sqrt{\e}$ converges weakly in $C([0,T];L^2(0,L))$ to the solution of the linearized equation where an extra Gaussian term appears.



(Seminar No. 5)

SPEAKER: Prof. Edward J. Wegman
George Mason University
Fairfax, VA 22030, U.S.A.

TITLE: Mixture Models for Document Clustering

TIME AND PLACE:  Thursday, October 30, 2008, 3:30pm
            Colloquium Room 3206, Math Build (not the usual room.)
Talk sponsored by Math Stat and the Stat Consortium. There will be a eception following the talk in the Math Lounge 3201.

ABSTRACT: Automatic clustering and classification of documents within corpora is a challenging task. Often, comparing word usage within the corpus, the so-called bag-of-words methodology, does this. The lexicon for a corpus can indeed be very large. For the example of 503 documents that we consider, there are more than 7000 distinct terms and more than 91,000 bigrams. This means that a term vector characterizing a document will be approximately 7000 dimensional. In this talk, we use an adaptation of normal mixture models with 7000 dimensional data to locate centroids of clusters. The algorithm works surprisingly well and is linear in all the size metrics.



(Seminar No. 6)

SPEAKER: Dr. Michail Sverchkov
BAE Systems IT and Bureau of Labor Statistics
Washington, DC 20212-0001, U.S.A.

TITLE: On Estimation of Response Probabilities when Missing Data are Not Missing at Random

TIME AND PLACE:  Thursday, November 6, 2008, 3:30pm
            Room 1313, Math Bldg

ABSTRACT: Most methods that deal with the estimation of response probabilities assume either explicitly or implicitly that the missing data are 'missing at random' (MAR). However, in many practical situations this assumption is not valid, since the probability to respond often depends directly on the outcome value. The case where the missing data are not MAR (NMAR) can be treated by postulating a parametric model for the distribution of the outcomes before non-response and a model for the response mechanism. The two models define a parametric model for the joint distribution of the outcomes and response indicators, and therefore the parameters of these models can be estimated by maximization of the likelihood corresponding to this distribution. Modeling the distribution of the outcomes before non-response, however, can be problematic since no data are available from this distribution.

In this talk we propose an alternative approach that allows to estimate the parameters of the response model without modelling the distribution of the outcomes before non-response. The approach utilizes relationships between the population, the sample and the sample complement distributions derived in Pfeffermann and Sverchkov (1999, 2003) and Sverchkov and Pfeffermann (2004).

Key words: sample distribution, complement-sample distribution, prediction under informative sampling or non-response, estimating equations, missing information principle, non-parametric estimation



(Seminar No. 7)

SPEAKER: Prof. Malay Ghosh
University of Florida
Gainesville, FL 32611-8545 , U.S.A.

TITLE: Bayesian Benchmarking in Small Area Estimation

TIME AND PLACE:  Thursday, November 13, 2008, 3:30pm
            Room 1313, Math Bldg

ABSTRACT: Abstract



( Seminar No. 8: Special Tuesday Seminar)

SPEAKER: Prof. Gauri S. Datta
University of Georgia
Athens, GA 30602, U.S.A.

TITLE: Estimation of Small Area Means under Measurement Error Models

TIME AND PLACE:  Tuesday, November 18, 2008, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: In recent years demand for reliable estimates for characteristics of small domains (small areas) has greatly increased worldwide due to growing use of such estimates in formulating policies and programs, allocating government funds, planning regional development, and marketing decisions at local level. However, due to cost and operational considerations, it is seldom possible to procure a large enough overall sample size to support direct estimates of adequate precision for all domains of interest. It is often necessary to employ indirect estimates for small areas that can increase the effective domain sample size by borrowing strength from related areas through linking models, using census and administrative data and other auxiliary data associated with the small areas. To this end, the nested error regression model for unit-level data and the Fay-Herriot model for the area-level data have been widely used in small area estimation. These models usually treat that the explanatory variables are measured without error. However, explanatory variables are often subject to measurement error. Both functional and structural measurement error models have been recently proposed by researchers in small area estimation to deal with this issue. In this talk, we consider both functional and structural measurement error models in discussing empirical Bayes (equivalently, empirical BLUP) estimation of small area means.



(Seminar No. 9)

SPEAKER: Dr. Gang Zheng
Office of Biostatistics Research, National Heart, Lung and Blood Institute
6701 Rockledge Drive, Bethesda, MD 20892-7913, U.S.A.

TITLE: On Robust Tests for Case-Control Genetic Association Studies

TIME AND PLACE:  Thursday, November 20, 2008, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: When testing association between a single marker and the disease using case-control samples, the data are presented in a 2x3 table. Pearson's Chi-square test (2 df) and the trend test (1 df) are commonly used. Usually one does not know which of them to choose. It depends on the unknown genetic model underlying the data. So one could either choose the maximum (MAX) of a family of trend tests over all possible genetic models (Davies, 1977, 1987) or take the smaller p-values (MIN2) of Pearson's test and the trend test (Wellcome Trust Case-Control Consortium, 2007).

We show that Pearson's test, the trend test and MAX are all trend tests with different types of scores: data-driven or prespecified and restricted or not restricted. The results provide insight into the properties that MAX is always more powerful than Pearson's test when the genetic model is restricted and that Pearson's test is more robust when the model is not restricted. For the MIN2 of WTCCC (2007), we show that its null distribution can be derived, so the p-value of MIN2 can be obtained. Simulation is used to compare the above four tests. We apply MIN2 to the result obtained by The SEARCH Collaborative Group (NEJM, August 21, 2008) who used MIN2 to detect a SNP in a genome-wide association study, but could not report the p-value for that SNP when MIN2 was used.

References:

1. Joo J, Kwak M, Ahn K and Zheng G. A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium. Biometrics (to appear).
2. Zheng G, Joo J and Yang Y. Pearson's test, trend test, and MAX are all trend tests with different type of scores. Unpublished manuscript. See Slides.



(Seminar No. 10: This Seminar is on a Tuesday)

SPEAKER: Dr. Yair Goldberg
Hebrew University of Jerusalem,
Mt. Scopus, Jerusalem, Israel

TITLE: Manifold learning: The price of normalization

TIME AND PLACE:  Tuesday, November 25, 2008, 3:30pm
             Room 1313, Math Bldg (room number may change)

ABSTRACT: The problem of finding a compact representation for high-dimensional data is encountered in many areas of science and has motivated the development of various dimension-reducing algorithms. The Laplacian EigenMap dimension-reducing algorithm (Belkin & Niyogi, 2003) is widely used for its intuitive approach and computational simplicity, claims to reveal the underlying non-linear structure of high-dimensional data.

We present a general class of examples in which the Laplacian EigenMap fails to generate a reasonable reconstruction of the data given to it. We both prove our results analytically and show them empirically. This phenomenon is then explained with an analysis of the limit-case behavior of the Laplacian EigenMap algorithm both using asymptotics and the continuous Laplacian operator. We also discuss the relevance of these findings to the algorithms Locally Linear Embedding (Roweis and Saul, 2000), Local Tangent Space Alignment (Zhang and Zha, 2004), Hessian Eigenmap (Donoho and Grimes, 2004), and Diffusion Maps (Coifman and Lafon, 2006).



(Seminar No. 11: DISTINGUISHED STATISTICS CONSORTIUM LECTURE
This Seminar is on a Friday)

SPEAKER: Mitchell H. Gail, M.D., Ph.D.
Senior Investigator
Biostatistics Branch, Div. Cancer Epidemiology & Genetics, National Cancer Institute,
Rockville, MD, 20852, U.S.A.

TITLE: Absolute Risk: Clinical Applications and Controversies

DATE/TIME:  Friday, December 5, 2008, 3:15--5:00pm

PLACE: Engineering Building Lecture Hall EGR 1202             

Immediately following the talk there will be a formal 25-minute Discussion, with a Reception to follow that.

ABSTRACT: Absolute risk is the probability that a disease will develop in a defined age interval in a person with specific risk factors. Sometimes absolute risk is called "crude" risk to distinguish it from the cumulative "pure" risk that might arise in the absence of competing causes of mortality. After defining absolute risk, I shall present a model for absolute breast cancer risk and illustrate its clinical applications. I will also describe the kinds of data and approaches that are used to estimate models of absolute risk and two criteria, calibration and discriminatory accuracy, that are used to evaluate absolute risk models. In particular, I will address whether well calibrated models with limited discriminatory accuracy can be useful.

Dr. Mitchell Gail received an M.D. from Harvard Medical School in 1968 and a Ph.D. in statistics from George Washington University in 1977. He joined NCI in 1969, and served as chief of the Biostatistics Branch from 1994 to 2008. Dr. Gail is a Fellow and former President of the American Statistical Association, a Fellow of the American Association for the Advancement of Science, an elected member of the American Society for Clinical Investigation, and an elected member of the Institute of Medicine of the National Academy of Sciences. He has received the Spiegelman Gold Medal for Health Statistics, the Snedecor Award for applied statistical research, the Howard Temin Award for AIDS Research, the NIH Director's Award, and the PHS Distinguished Service Medal.

Discussant: Professor Bilal Ayyub
Department of Civil & Environmental Engineering, UMCP
College Park, MD, 20742, U.S.A.


Discussion, 4:15pm: Engineering perspectives on Risk


Professor Ayyub is a Professor of Civil and Environmental Engineering at the University of Maryland College Park and Director of the Center for Technology and Systems Management. He is a Fellow of the ASCE, ASME, and SNAME.



(Seminar No. 12)

SPEAKER: Dr. Janice Lent
Energy Information Administration
Washington, DC 20585, U.S.A.

TITLE: Some Properties of Price Index Formulas

TIME AND PLACE:  Thursday, December 11, 2008, 3:30pm
            Room 1313, Math Bldg

ABSTRACT: Price indexes are important statistics that move large amounts of money in the U.S. economy. In order to adjust monetary figures for inflation/deflation, we must develop methods of using sample data to estimate changes in the value of a currency. A vast array of target price index formulas are discussed in the economics literature. In this seminar, we will present some of the formulas that are widely used by government statistical agencies as targets for price index estimation. We will examine and compare some of the properties of these formulas, including underlying economic assumptions, ease of estimation, and sensitivity to extreme values.



Spring 2009 Talks

(Seminar No. 13)

SPEAKER: Dr. Zhe Lin
Institute for Advanced Computer Studies, University of Maryland
College Park, MD 20742, U.S.A.

TITLE: Recognizing Actions by Shape-Motion Prototypes

TIME AND PLACE:  Thursday, February 12, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: In this talk, I will introduce our recent work on gesture or action recognition based on shape-motion prototypes. During training, a set of action prototypes are learned in a joint shape and motion space via k-means clustering; During testing, humans are tracked while a frame-to-prototype correspondence is established by nearest neighbor search, and then actions are recognized using dynamic prototype sequence matching. Similarity matrices used for sequence matching are efficiently obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distance. Our approach enables robust action matching in very challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences by dynamic time warping. Experimental results demonstrate that our approach achieves over 91% recognition rate on a large gesture dataset containing 294 video clips of 14 different gestures, and 100% on the Weizmann action dataset.



(Seminar No. 14)

SPEAKER: Prof. Refik Soyer
George Washington University
Washington, DC 20052, U.S.A.

TITLE: Information Importance of Predictors

TIME AND PLACE:  Thursday, February 19, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: The importance of predictors is characterized by the extent to which their use reduces uncertainty about predicting the response variable, namely their information importance. Shannon entropy is used to operationalize the concept. For nonstochastic predictors, maximum entropy characterization of probability distributions provides measures of information importance. For stochastic predictors, the expected entropy difference gives measures of information importance, which are invariant under one-to-one transformations of the variables. Applications to various data types lead to familiar statistical quantities for various models, yet with the unified interpretation of uncertainty reduction. Bayesian inference procedures for the importance and relative importance of predictors are developed. Three examples show applications to normal regression, contingency table, and logit analyses.



(Seminar No. 15)

SPEAKER: Lior Noy
Harvard Medical School
Boston, MA 02115, U.S.A.

TITLE: Studying Eye Movements in Movement Imitation

TIME AND PLACE:  Thursday, February 26, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: People, animals and robots can learn new actions from observation. In order to do so, they need to transform the visual input to motor output. What is the nature of this transformation? What are the visual features that are extracted and used by the imitator? A possible route for answering these questions is to analyze imitator eye movements during imitation. We monitored eye movements of human subjects while they were watching simple, one-arm movements in two conditions. In the watch-only condition the observers were instructed only to watch the movements. In the imitate condition the observers were instructed to watch and then to imitate each movement. Gaze trajectories were compared between the two conditions. In addition, we compared the human behavior to the predications of the Itti-Koch saliency-map model [1]. To determine the similarity among gaze trajectories of different observers we developed a novel comparison method, based on semi-parametric statistics. We compared this method to the more standard usage of cross-correlation scores and show the advantages of this method, in particular its ability to state that two gaze trajectories are either different or similar in a statistically significant way. Our results indicate that: (1)Subjects fixate at both the joints and the end-effectors of the observed moving arms, in contrast to previous reports [2]. (2)The Itti-Koch saliency-map model does not fully account for the human gaze trajectories. (3)Eye movements in movement imitation are similar to each other in the watch-only versus the imitate conditions.

Joint work with: Benjamin Kedem & Ritaja Sur, University of Maryland, and Tamar Flash, Weizmann Institute of Science.

References

[1] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40:1489-1506, 2000.
[2] M. J. Mataric and M. Pomplun. Fixation behavior in observation and imitation of human movement. Cognitive Brain Research, 7(2):191-202, 1998.



(Seminar No. 16)

SPEAKER: Prof. Yasmin H. Said (Bio)
George Mason University
Fairfax, Virginia 22030, U.S.A.

TITLE: Microsimulation of an Alcohol System

TIME AND PLACE:  Thursday, March 5, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: Users of alcohol are incorporated into a societal system, which for many purposes resembles an ecological system. An understanding of how this ecological alcohol system works provides an opportunity to evaluate effectiveness of interventions. I use a hybrid directed graph social network model calibrated with conditional probabilities derived from actual data with the idea of reproducing the experience of acute outcomes reflecting undesirable individual and societal outcomes. In the present model, I also approximate geospatial effects related to transportation as well as temporal effects. Drinking behaviors among underage users can be particularly harmful from both a societal and individual perspective. Using the model based on data from experiences in Fairfax County, Virginia, I am able to reproduce the multinomial probability distribution of acute outcomes with high accuracy using a microsimulation of all residents of Fairfax, approximately 1,000,000 agents simulated. By adjusting conditional probabilities corresponding to interventions, I am able to simulate the effects of those interventions. This methodology provides an effective tool for investigating the impact of interventions and thus provides guidance for public policy related to alcohol use.



(Seminar No. 17)

SPEAKER: Dr. Philip Rosenberg
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH
Rockville, MD 20852-4910, U.S.A.

TITLE: Proportional Hazards Models and Age-Period-Cohort Analysis of Cancer Rates

TIME AND PLACE:  Thursday, March 12, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: Age-period-cohort (APC) analysis is widely used in cancer epidemiology to model trends in cancer rates. We develop methods for comparative APC analysis of two independent cause-specific hazard rates assuming that an APC model holds for each one. We construct linear hypothesis tests to determine whether the two hazards are absolutely proportional, or proportional after stratification by cohort, period, or age. When a given proportional hazards model appears adequate, we derive simple expressions for the relative hazards using identifiable APC parameters. We also construct a linear hypothesis test to assess whether the logarithms of the fitted age-at-event curves are parallel after adjusting for possibly heterogeneous period and cohort effects, a relationship that can hold even when the expected hazard rates are not proportional. To assess the utility of these new methods, we surveyed cancer incidence rates in Blacks versus Whites for the leading cancers in the United States, using data from the National Cancer Institute's Surveillance, Epidemiology, and End Results Program. Our comparative survey identified cancers with parallel and crossing age-at-onset curves, cancers with rates that were proportional after stratification by cohort, period, or age, and cancers with rates that were absolutely proportional. Proportional hazards models provide a useful statistical framework for comparative APC analysis.



(Seminar No. 18)

SPEAKER: Dr. Hormuzd Katki
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS
Rockville, MD 20852-4910, U.S.A.

TITLE: Insights into p-values and Bayes Factors from False Positive and False Negative Bayes Factors

TIME AND PLACE:  Thursday, March 26, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: The Bayes Factor has stronger theoretical justification than p-values for quantifying statistical evidence, but when the goal is hypothesis testing, the Bayes Factor yields no insight about false positive vs. false negative results. I introduce the False Positive Bayes Factor (FPBF) and the False Negative Bayes Factor (FNBF) and show that they are approximately the two components of the Bayes Factor. In analogy to diagnostic testing, the FPBF and FNBF provide additional insight not obvious from the Bayes Factor. FPBF & FNBF require only the p-value and the power under an alternative hypothesis, forging a new link of p-values to Bayes Factors. This link can be exploited to understand differences in inferences drawn by Bayes Factors versus p-values. In a genome-wide association study of prostate cancer, FPBF & FNBF help reveal the two SNP mutations declared positive by p-values and Bayes Factors that with future data turned out to be false positives.



(Seminar No. 20)

SPEAKER: Dr. Hiro Hikawa
Department of Statistics, George Washington University
Washington, DC 20052, U.S.A.

TITLE: Robust Peters-Belson Type Estimators of Measures of Disparity and their Applications in Employment Discrimination Cases

TIME AND PLACE:  Thursday, April 16, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: In discrimination cases concerning equal pay, the Peters-Belson (PB) regression method is used to estimate the pay disparities between minority and majority employees after accounting for major covariates (e.g., seniority, education). Unlike the standard approach, which uses a dummy variable to indicate protected group status, the PB method first fits a linear regression model for the majority group. The resulting regression equation is then used to predict the salary of each minority employee by using their individual covariates in the equation. The difference between the actual and the predicted salaries of each minority employee estimates the pay differential for that minority employee, which takes into account legitimate job-related factors. The average difference estimates a measure of pay disparity. In practice, however, a linear regression model may not be sufficient to capture the actual pay-setting practices of the employer. Therefore, we use a locally weighted regression model in the PB approach as a specific functional form of the relationship between pay and relevant covariates is no longer needed. The statistical properties of the new procedure are developed and compared to those of the standard methods. The method also extends to the case with a binary (1-0) response, e.g., hiring or promotion. Both simulation studies and re-analysis of actual data show that, in general, the locally weighted PB regression method reflects the true mean function more accurately than the linear model, especially when the true function is not a linear or logit (for a 1-0 response) model. Moreover, only a small loss of efficiency is incurred when the true relation follows a linear or logit model.



(Seminar No. 21)

SPEAKER: Dr. Tsong Yi (Bio)
Division of Biometrics VI, OB/OTS/CDER, FDA
Silver Spring, MD 20993, U.S.A.

TITLE: Multiple Testing Issues in Thorough QTc Clinical Trials

TIME AND PLACE:  Thursday, April 23, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: Clinical trial endpoint often measured repeatedly at multiple time points with the objective to show either that the test treatment is more effective than control treatment at at-least one time point or to show that it is more effective than control treatment at all time points. With either objective, it involves with multiple comparisons and the issues of type I error rate control. We illustrate the problem with the example of thorough QT clinical trials. The ICH E14, 2005 defined that drug-induced prolongation of QT interval as evidenced by an upper bound of the 95% confidence interval around the mean effect on QTc of 10 ms. Further more it defined that a negative thorough QT/QTc study is one in which the upper bound of the 95% one-sided confidence interval for the largest time-matched mean effect of the drug on the QTc interval excludes 10 ms. It leads to the requirement of showing non-inferiority of the test treatment to placebo at multiple time points. Conventionally, it is carried out by testing multiple hypotheses at 5% type I error rate each. The multiple comparison concern of this analysis is conservativeness when the number of tests is many. On the other hand, when the study result is negative, ICH E14 recommended to validate the negative result by showing that the study population is sensitive enough to show at least 5 ms prolongation of QTc interval of a carefully selected positive control. The validation test is often carried out by demonstrating that the mean difference between positive control and placebo is greater than 5 ms at at-least one of the selected few time points. The multiple comparison nature of the validation test led to the concerns of type I error rate inflation. Both of the multiple comparison issue can be represented by the biasness of using the maximum of the estimates of treatment difference as the estimate of the maximum of the expected differences. We will discuss a few proposed approaches to address the problem.



(Seminar No. 22)

SPEAKER: Dr. Alan Dorfman Dr. Alan Dorfman
Bureau of Labor Statistics, U.S. Department of Labor
NE Washington, DC 20212-0001, U.S.A.

TITLE: Nonparametric Regression and the Two Sample Problem

TIME AND PLACE:  Thursday, April 30, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT: The two sample problem: two distinct surveys gather information on a variable y of interest from a single frame, differing perhaps in sample design and sample size, but with common auxiliary information x. How should we combine the data from the surveys to get a single estimate? Nonparametric regression: Models are often used in survey sampling to sharpen inference on y based on more complete knowledge of an auxiliary variable x. Because of the tentativeness of models in most circumstances, samplers typically buttress their model-based inference by embedding it in a design-based framework ("model assisted" estimation). An alternate approach is to use very weak models and nonparametric regression. A simple two sample problem is described and several approaches to handling it described. A simple, somewhat disguised version of nonparametric regression provides a nice solution. Some problematic and controversial aspects of nonparametric regression in survey sampling are discussed.



( Seminar No. 23)

SPEAKER: Prof. Andrew J. Waters
Uniformed Services University of the Health Sciences
Bethesda, MD 20814, U.S.A.

TITLE: Using ecological momentary assessment to study relapse in addiction

TIME AND PLACE:  Thursday, May 7, 2009, 3:30pm
             Room 1313, Math Bldg

ABSTRACT:

Rationale There has been growing interest in the use of handheld computers (PDAs) to collect behavioral data in a naturalistic or Ecological Momentary Assessment (EMA) setting. In many EMA studies, participants carry around a PDA with them as they go about their daily lives. They are beeped at random times on 4 or 5 occasions per day. When beeped, they complete items assessing subjective and contextual variables. Because each participant typically completes a fairly large number of assessments, EMA studies can generate large and complex datasets. The talk will first provide an overview of how EMA methods have been used to study addiction. I will also discuss a number of studies in which implicit cognitive assessments (reaction time tasks) have been administered on a PDA in an EMA setting. In an initial study, twenty-two smokers and 22 non-smokers carried around a PDA for 1-week (Waters & Li, 2008). They were beeped at random times on 4 occasions per day (RAs). At each assessment, participants responded to items assessing subjective, pharmacological, and contextual variables. They subsequently completed a Stroop task. In a second study, 30 participants completed an Implicit Association Test (IAT) at each assessment. In a third study, 68 heroin abusers undergoing drug detoxification in a detoxification clinic completed implicit/explicit cognitive assessments at each assessment. In a fourth study, 81 participants wishing to quit smoking have carried around a PDA for 1-week after their quit date. The talk will address: 1) The feasibility of assessing implicit/explicit cognitions on PDAs in an EMA setting; 2) The statistical methods that have been employed to analyze the EMA data; and 3) The unique associations between implicit/explicit cognitions and temptations/relapse that have been revealed in EMA data.



Seminar Talks 2006-2007

Spring 2007 Talks

SPEAKER: Professor Leonid Koralov
Mathematics Department, UMCP

TITLE: Averaging of Hamiltonian Flows with an Ergodic Component

TIME AND PLACE:  Thurs., Feb. 8, 2007, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: We consider a process which consists of the fast motion along the stream lines of an incompressible periodic vector field perturbed by the white noise. Together with D. Dolgopyat we showed that for almost all rotation numbers of the unperturbed flow, the perturbed flow converges to an effective, "averaged" Markov process.


SPEAKER: Professor Donald Martin
Mathematics Department, Howard University & Census Bureau Stat. Resch. Div.

TITLE: Distributions of patterns and statistics in higher-order Markovian sequences

TIME AND PLACE:  Thurs., Feb. 15, 2007, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: In this talk we discuss a method for computing distributions associated with general patterns and statistics in higher-order Markovian sequences. An auxiliary Markov chain is associated with the original sequence and probabilities are computed through the auxiliary chain, simplifying computations that are intractable using combinatorial or other approaches. Three distinct examples of computations are given: (1) sooner or later waiting time distributions for collections of compound patterns that must occur pattern-specific numbers of times, using either overlapping counting or two types of non-overlapping counting; (2) the joint distribution of the total number of successes in success runs of length at least , and the distance between the beginning of the first such success run and the end of the last one; (3) the distribution of patterns in underlying variables of a hidden Markov model. Applications to missing and noisy data and to bioinformatics are given to illustrate the usefulness of the computations.


SPEAKER: Professor Alexander S. Cherny
Moscow State University

TITLE: Coherent Risk Measures

TIME AND PLACE:  Tues., Feb. 20, 2007, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: The notion of a coherent risk measure was introduced by Artzner, Delbaen, Eber, and Heath in 1997 and by now this theory has become a considerable and very rapidly evolving branch of the modern mathematical finance. The talk will be aimed at describing basic results of this theory, including the basic representation theorem of Artzner, Delbaen, Eber, and Heath as well as the characterization of law invariant risk measures obtained by Kusuoka. It will also include some recent results obtained by the author, related to the strict diversification property and to the characterization of dilatation monotone coherent risks.


SPEAKER: Dr. Siamak Sorooshyari
Lucent Technologies -- Bell Laboratories

TITLE: A Multivariate Statistical Approach to Performance Analysis of Wireless Communication Systems

TIME AND PLACE:  Thurs., Mar. 1, 2007, 3:30pm
          Room 1313, Math Bldg

NOTE: this seminar is presented jointly with the Norbert Wiener Center.

ABSTRACT: The explosive growth of wireless communication technologies has placed paramount importance on accurate performance analysis of the fidelity of a service offered by a system to a user. Unlike the channels of wireline systems, a wireless medium subjects a user to time-varying detriments such as multipath fading, cochannel interference, and thermal receiver noise. As a countermeasure, structured redundancy in the form of diversity has been instrumental in ensuring reliable wireless communication characterized by a low bit error probability (BEP). In the performance analysis of diversity systems the common assumption of uncorrelated fading among distinct branches of system diversity tends to exaggerate diversity gain resulting in an overly optimistic view of performance. A limited number of works take into account the problem of statistical dependence. This is primarily due to the mathematical complication brought on by relaxing the unrealistic assumption of independent fading among degrees of system diversity. We present a multivariate statistical approach to the performance analysis of wireless communication systems employing diversity. We show how such a framework allows for the statistical modeling of the correlated fading among the diversity branches of the system users. Analytical results are derived for the performance of maximal-ratio combining (MRC) over correlated Gaussian vector channels. Generality is maintained by assuming arbitrary power users and no specific form for the covariance matrices of the received faded signals. The analysis and results are applicable to binary signaling over a multiuser single-input multiple-output (SIMO) channel. In the second half of the presentation, attention is given to the performance analysis of a frequency diversity system known as multicarrier code-division multiple-access (MC-CDMA). With the promising prospects of MC-CDMA as a predominant wireless technology, analytical results are presented for the performance of MC-CDMA in the presence of correlated Rayleigh fading. In general, the empirical results presented in our work show the effects of correlated fading to be non-negligible, and most pronounced for lightly-loaded communication systems.


SPEAKER: Professor Harry Tamvakis
Mathematics Department, UMCP

TITLE: The Dominance Order

TIME AND PLACE:  Thurs., Mar. 8, 2007, 3:30pm
          Room 1313, Math Bldg

Abstract: The dominance or majorization order has its origins in the theory of inequalities, but actually appears in many strikingly disparate areas of mathematics. We will give a selection of results where this partial order appears, going from inequalities to representations of the symmetric group, families of vector bundles, orbits of nilpotent matrices, and finally describe some recent links between them.

NOTE: The topic of this talk is related to the following problem being studied in the RIT of Prof. Abram Kagan:
Consider a round robin tournament with n players (each plays with each one game; the winner gets one point, the loser zero). The outcome of the tournament is a set of n integers, a1 >= a2 >= ... >= an where a1 is the total score of the tournament winner(s), a2 the score of the second-place finisher, etc. Not all such sets are possible outcomes but all the possible outcomes can be described. A number of interesting probability problems arise here. E.g., assume that n players are equally strong, i. e., the probability that player i beats player j is 1/2 for all i, j. The expected score of each player in the tournament is (n-1)/2. But what is the expected score (or the distribution of the score) of the winner(s)? At the moment the answer is unknown even in the asymptotic formulation (i. e., for large n).


SPEAKER: Zhibiao Zhao
Staistics Department, University of Chicago

TITLE: Confidence Bands in Nonparametric Time Series Regression

TIME AND PLACE:  Tues., March 27, 2007, 3:30pm           NOTE special seminar time.
          Room 1313, Math Bldg

Abstract: Nonparametric model validation under dependence has been a difficult problem. Fan and Yao (Nonlinear Time Series: Nonparametric and Parametric Methods, 2003, page 406) pointed out that there have been virtually no theoretical development on nonparametric model validations under dependence, despite the importance of the latter problem since dependence is an intrinsic characteristic in time series. In this talk, we consider nonparametric estimation and inference of mean regression and volatility functions in non- linear stochastic regression models. Simultaneous confidence bands are constructed and the coverage probabilities are shown to be asymptotically correct. The imposed dependence structure allows applications in many nonlinear autoregressive processes and linear processes, including both short-range dependent and long-range dependent processes. The results are applied to the S&P 500 Index data. Interestingly, the constructed simultaneous confidence bands suggest that we can accept the two null hypotheses that the regression function is linear and the volatility function is quadratic.


SPEAKER: Dr. Ram Tiwari
National Cancer Institute, NIH

TITLE: Two-sample problems in ranked set sampling

TIME AND PLACE:  Thurs., March 29, 2007, 3:30pm
          Room 1313, Math Bldg

Abstract: In many practical problems, the variable of interest is difficult/expensive to measure but the sampling units can be easily ranked based on another related variable. For example, in studies of obesity, the variable of interest may be the amount of body fat, which is measured by Dual Energy X-Ray Absorptiometry --- a costly procedure. The surrogate variable of body mass index is much easier to work with. Ranked set sampling is a procedure of improving the efficiency of an experiment whereby one selects certain sampling units (based on their surrogate values) that are then measured on the variable of interest. In this talk, we will first discuss some results on two-sample problems based on ranked set samples. Several nonparametric tests will be developed based on the vertical and horizontal shift functions. It will be shown that the new methods are more powerful compared to procedures based on simple random samples of the same size.

When the measurement of surrogate variable is moderately expensive, in the presence of a fixed total cost of sampling, one may resort to a generalized sampling procedure called k-tuple ranked set sampling, whereby k(>1) measurements are made on each ranked set. In the second part of this talk, we will show how one can use such data to estimate the underlying distribution function or the population mean. The special case of extreme ranked set sample, where data consists of multiple copies of maxima and minima will be discussed in detail due to its practical importance. Finally, we will briefly discuss the effect of incorrect ranking and provide an illustration using data on conifer trees.


SPEAKER: Guanhua Lu
Statistics Program, UMCP

TITLE: Asymptotic Theory in Multiple-Sample Semiparametric Density Ratio Models

TIME AND PLACE:  Thurs., April 5, 2007, 3:30pm
          Room 1313, Math Bldg

Abstract: A multiple-sample semiparametric density ratio model can be constructed by multiplicative exponential distortions of the reference distribution. Distortion functions are assumed to be nonnegative and of a known finite-dimensional parametric form, and the reference distribution is left nonparametric. The combined data from all the samples are used in the semiparametric large sample problem of estimating each distortion and the reference distribution. The large sample behavior for both the parameters and the unknown reference distribution are studied. The estimated reference distribution has been proved to converge weakly to a zero-mean Gaussian process.


SPEAKER: Dr. Gabor Szekely
NSF and Bowling Green State University

TITLE: Measuring and Testing Dependence by Correlation of Distances

TIME AND PLACE:  Thurs., April 12, 2007, 3:30pm
          Room 1313, Math Bldg

Abstract: We introduce a simple new measure of dependence between random vectors. Distance covariance (dCov) and distance correlation(dCor) are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, dCor = 0 characterizes independence for the general case. The empirical dCov and dCor are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Definitions can be extended to metric-space-valued observations where the random vectors could even be in different metric spaces. Asymptotic properties and applications in testing independence will also be discussed. A new universally consistent test of multivariate independence is developed. Distance correlation can also be applied to prove CLT for strongly stationary sequences.



Distinguished JPSM Lecture co-Sponsored by Statistics Consortium

SPEAKER: Professor Roderick J. Little
Departments of Biostatistics and Statistics and Institute for Social Research, University of Michigan

TITLE: Wait! Should We Use the Survey Weights to Weight?

TIME AND PLACE:  Friday, April 13, 2007, 3:30pm
          Room 2205, Lefrak Hall

Two discussants will speak following Professor Little's talk:
John Eltinge of Bureau of Labor Statistics and Richard Valliant from JPSM.



SPEAKER: Dr. Song Yang
Office of Biostatistics Research, National Heart Lung and Blood Institute, NIH

TITLE: Some versatile tests of treatment effect using adaptively weighted log rank statistics

TIME AND PLACE:  Thurs., April 19, 2007, 3:30pm
          Room 1313, Math Bldg

Abstract: For testing treatment effect with time to event data, the log rank test is the most popular choice and is optimal for proportional hazards alternatives. When a range of possibly nonproportional alternatives are possible, combinations of several tests are often used. Currently available methods inevitably sacrifice power at proportional alternatives and may also be computationally demanding. We introduce some versatile tests that use adaptively weighted log rank statistics. Extensive numerical studies show that these new tests almost uniformly improve the tests that they modify, and are optimal or nearly so for proportional alternatives. In particular, one of the new tests maintains optimality at the proportional alternatives and also has very good power at a wide range of nonproportional alternatives, thus is the test we recommend when flexibility in the treatment effect is desired. The adaptive weights are based on the model of Yang and Prentice (2005).



Statistics Consortium Lecture co-Sponsored by JPSM and MPRC

SPEAKER: Professor Bruce Spencer
Statistics Department & Faculty Fellow, Institute for Policy Research, Northwestern University

TITLE: Statistical Prediction of Demographic Forecast Accuracy

TIME AND PLACE:  Friday, April 27, 2007, 3:15pm
          Room 2205, Lefrak Hall

ABSTRACT: Anticipation of future population change affects public policy deliberations on (i) investment for health care and pensions,
(ii) effects of immigration policy on the economy, (iii) future competitiveness of the U.S. economy, to name just three. In this talk, we review some statistical approaches used to predict the accuracy of demographic forecasts and functional forecasts underlying the policy discussions. A functional population forecast is one that is a function of the population vector as well as other components, for example a forecast of the future balance of a pension fund. No background in demography will be assumed, and the necessary demographic concepts will be introduced from the statistical point of view. The talk is based on material in Statistical Demography and Forecasting by J. M. Alho and B. D. Spencer (2005, Springer) and reflects joint work by the authors.

Following Professor Spencer's talk, there will be a formal Discussion, by Dr. Peter Johnson of the International Programs Center of the Census Bureau and Dr. Jeffrey Passel of the Pew Hispanic Center. Following the formal and floor discussion, there will be a reception including refreshments.


SPEAKER: Professor Dennis Healy
Mathematics Department, UMCP

TITLE: TBA

TIME AND PLACE:  Postponed
    


NOTE: this seminar will be presented jointly with the Norbert Wiener Center.


Fall 2005 Talks


SPEAKER: Prof. Ross Pinsky
Mathematics Department, Technion, Israel

TITLE: Law of Large Numbers for Increasing Subsequences of Random Permutations

TIME AND PLACE:  Tues., August 23, 2005, 2pm
          Room 1313, Math Bldg

ABSTRACT: click here.


SPEAKER: Prof. Paul Smith
Statistics Program, Mathematics Department, UMCP

TITLE: Statistical Analysis of Ultrasound Images of Tongue Contours
               during Speech


TIME AND PLACE:  Thurs., September 15, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: The shape and movement of the tongue are critical in the formation of human speech. Modern imaging techniques allow scientists to study tongue shape and movement without interfering with speech. This presentation describes statistical isssues arising from ultrasound imaging of tongue contour data.

There are many sources of variability in tongue image data, including speaker to speaker differences, intraspeaker differences, noise in the images, and other measurement problems. To make matters worse, the tongue is supported entirely by soft tissue, so no fixed co-ordinate system is available. Statistical methods to deal with these problems are presented.

The goal of the research is to associate tongue shapes and sound production. Principal component analysis is used to reduce contours. Combinations of two basic shapes accurately represent tongue contours. The results are physiologically meaningful and correspond well to actual speech activity. The methods are applied to a sample of 16 subjects, each producing four vowel sounds. It was found that principal components clearly distinguish vowels based on tongue contours.

We also investigate whether speakers fall into distinct groups on the basis of their tongue contours. Cluster analysis is used to identify possible groupings, but many variants of this technique are possible and the results are sometimes conflicting. Methods to compare multiple cluster analyses are suggested and applied to tongue contour to assess the meaning of apparent speaker clusters.


SPEAKER: Prof. Benjamin Kedem
Statistics Program, Mathematics Department, UMCP

TITLE: A Semiparametric Approach to Time Series Prediction

TIME AND PLACE:  Thurs., September 22, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: Given m time series regression models, linear or not, with additive noise components, it is shown how to estimate the predictive probability distribution of all the time series conditional on the observed and covariate data at the time of prediction. This is done by a certain synergy argument, assuming that the distributions of the noise components associated with the regression models are tilted versions of a reference distribution. Point predictors are obtained from the predictive distribution as a byproduct. An application to US mortality rates prediction will be discussed.


A former student of our Statistics Program, Dean Foster of the Statistics Department at the Wharton School, University of Pennsylvania, will be visiting the Business School on Friday 9/23/05 and giving a seminar entitled "Learning Nash equilibria via public calibration" from 3-4:15 pm in Van Munching Hall Rm 1206.

You can see an abstract of the talk by clicking here.


SPEAKER: Professor Steven Martin
Department of Sociology, University of Maryland College Park

TITLE: Reassessing delayed and forgone marriage in the United States

TIME AND PLACE:   Wed., September 28, 2005, 3:30pm
          Room 1313, Math Bldg
         NOTE UNUSUAL TIME !

ABSTRACT: Do recent decreases in marriage rates mean that more women are forgoing marriage, or that women are simply marrying at later ages? Recently published demographic projections from standard nuptiality models that suggest changes in marriage rates have different implications for women of different social classes, producing an "education crossover" in which four-year college graduate women have become more likely to marry than other women in the US, instead of less likely as has been the case for at least a century. To test these findings, I develop a new projection technique that predicts the proportion of women marrying by age 45 under flexible assumptions about trends in age-specific marriage rates and effects of unmeasured heterogeneity. Results from the 1996 and 2001 Surveys of Income and Program Participation suggest that the "crossover" in marriage by educational attainment is either not happening or is taking much longer than predicted. Also, recent trends are broadly consistent with an ongoing slow decline in proportions of women ever marrying, although that decline is less pronounced in the last decade than in previous decades.


SPEAKER: Professor Rick Valliant
Joint Program in Survey Methodology, Univ. of Michigan & UMCP

TITLE: Balanced Sampling with Applications to Accounting Populations

TIME AND PLACE:  Thurs., October 6, 2005, 3:30pm
         Room 1313, Math Bldg

ABSTRACT: Weighted balanced sampling is a way of restricting the configure of sample units that can be selected from a finite population. This method can be extremely efficient under certain types of structural models that are reasonable in some accounting problems. We review theoretical results that support weighted balancing, compare different methods of selecting weighted balanced samples, and give some practical examples. Where appropriate, balancing can meet precision goals with small samples and can be robust to some types of model misspecification. The variance that can be achieved is closely related to the Godambe-Joshi lower bound from design-based theory.

One of the methods of selecting these samples is restricted randomization in which "off-balance" samples are rejected if selected. Another is deep stratification in which strata are formed based on a function of a single auxiliary and one or two units are selected with equal probability from each stratum. For both methods, inclusion probabilities can be computed and design-based inference done if desired.

Simulation results will be presented to compare results from balanced samples with ones selected in more traditional ways.


SPEAKER: Professor Wolfgang Jank
Department of Decision & Information Technologies
The Robert H. Smith School of Business, UMCP

TITLE: Stochastic Variants of EM: Monte Carlo, Quasi-Monte Carlo, and More

TIME AND PLACE:  Thurs., October 20, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: We review recent advances in stochastic implementations of the EM algorithm. We review the Ascent-based Monte Carlo EM algorithm, a new automated version of Monte Carlo EM based on EM's likelihood ascent property. We discuss more efficient implementations via quasi-Monte Carlo sampling. We also re-visit a new implementation of the old stochastic approximation version for EM. We illustrate some of the methods on a geostatistical model of online purchases.

The slides for Professor Jank's presentation are linked here .


SPEAKER: Professor Ciprian Crainiceanu
Johns Hopkins Biostatistics Department, School of Public Health

TITLE: Structured Estimation under Adjustment Uncertainty

TIME AND PLACE:  Thurs., October 27, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: Population health research is increasingly focused on identifying small risks by use of large databases containing millions of observations and hundreds or thousands of covariates. As a result, there is an increasing need to develop statistical methods to estimate these risks and properly account for all their sources of uncertainty. An example is the estimation of the health effects associated with short-term exposure to air pollution, where the goal is to estimate the association between daily changes in ambient levels of air pollution and daily changes in the number of deaths or hospital admissions accounting for many confounders, such as other pollutants, weather, seasonality, and influenza epidemics.

Regression models are commonly used to estimate the effect of an exposure on an outcome, while controlling for confounders. The selection of confounders and of their functional form generally affects the exposure effect estimate. In practice, there is often substantial uncertainty about this selection, which we define here as ``adjustment uncertainty".

In this paper, we propose a general statistical framework to account for adjustment uncertainty in risk estimation called ``Structured Estimation under Adjustment Uncertainty (STEADy)". We consider the situation in which a rich set of potential confounders is available and there exists a model such that every model nesting it provides the correctly adjusted exposure effect estimate. Our approach is based on a structured search of the model space that sequentially identifies among all the potential confounders the ones that are good predictors of the exposure and of the outcome, respectively.

Through theoretical results and simulation studies, we compare ``adjustment uncertainty" implemented with STEADy versus ``model uncertainty" implemented with Bayesian Model Averaging (BMA) for exposure effect estimation. We found that BMA, by averaging parameter estimates adjusted by different sets of confounders, estimates a quantity that is not the scientific focus of the investigation and can over or underestimate statistical variability. Another potential limitation of BMA in this context is the strong dependence of posterior model probabilities on prior distributions. We show that using the BIC approximation of posterior model probabilities favors models more parsimonious than the true model, and that BIC is not consistent under assumptions relevant for moderate size signals.

Finally we apply our methods to time series data on air pollution and health to estimate health risks accounting for adjustment uncertainty. We also compare our results with a BMA analysis of the same data set. The open source R package STEADy   implementing this methodology for Generalized Linear Models (GLMs) will be available at the R website.

You can see the paper on which this talk is based, here .


     No Seminar Thursday 11/3.
     But NOTE special seminar at unusual time on Monday 11/7, below.



SPEAKER: Professor Lise Getoor
Department of Computer Science, UMCP

TITLE: Learning Statistical Models from Relational Data

TIME AND PLACE:   Mon., November 7, 2005, 4-5pm
          Room 1313, Math Bldg
         NOTE UNUSUAL TIME !

ABSTRACT:
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat" data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby losing much of the relational structure present in the data and potentially introducing statistical skew. These drawbacks severely limit the ability of current methods to mine relational databases.

In this talk I will review recent work on probabilistic models, including Bayesian networks (BNs) and Markov Networks (MNs) and their relational counterpoints, Probabilistic Relational Models (PRMs) and Relational Markov Networks (RMNs). I'll briefly describe the development of techniques for automatically inducing PRMs directly from structured data stored in a relational or object-oriented database. These algorithms provide the necessary tools to discover patterns in structured data, and provide new techniques for mining relational data. As we go along, I'll present experimental results in several domains, including a biological domain describing tuberculosis epidemiology, a database of scientific paper author and citation information, and Web data. Power-point slides for an extended tutorial related to Professor Getoor's talk can be found here .
Additional related research can be found at her home-page.


SPEAKER: Professor Victor de Oliveira
Department of Mathematical Sciences, University of Arkansas

TITLE: Bayesian Analysis of Spatial Data: Some Theoretical Issues and Applications in the Earth Sciences

TIME AND PLACE:   Thurs., November 10, 2005, 4:00pm
          Room 3206, Math Bldg

          NOTE change to unusual 4-5pm time-slot and unusual location!!

ABSTRACT: Random fields are useful mathematical tools for modeling spatially varying phenomena. This talk will focus on Bayesian analysis of geostatistical data based on Gaussian random fields (or models derived from these), which have been extensively used for the modeling and analysis of spatial data in most earth sciences, and are usually the default model (possibly after a transformation of the data).

The Bayesian approach for the analysis of spatial data has seen in recent years an upsurge in interest and popularity, mainly due to the fact that it is particularly well suited for inferential problems that involve prediction. Yet, implementation of the Bayesian approach faces several methodological and computational challenges, most notably:

(1) The likelihood behavior of covariance parameters is not well understood, with the possibility for ill behaviors. In addition, there is a lack of automatic or default prior distributions for the parameters these models, such as Jeffreys and reference priors.

(2) There are substantial computational difficulties for the implementation of Markov chain Monte Carlo methods required for carrying out Bayesian inference and prediction based on moderate or large spatial datasets.

This talk presents recent advances in the formulation of default prior distributions as well as some properties, Bayesian and frequentist, of inferences based on these priors. We illustrate some of the issues and problems involved using simulated data, and apply the methods for the solution of several inferential problems based on two spatial datasets: one dealing with pollution by nitrogen in the Chesapeake bay, and the other dealing with depths of a geologic horizon based on censored data.

If time permits, a new computational algorithm is described that can substantially reduce the computational burden mentioned in (2). Finally, we describe some challenges and open problems whose solution would make the Bayesian approach more appealing.


NO STATISTICS SEMINAR Thursday, November 17, 2005.

BUT NOTE THAT ON FRIDAY, NOVEMBER 18, 2005, THERE IS A PAIR OF TALKS
in the Distinguished Lecture Series at the University of Maryland co-sponsored
by the Joint Program in Survey Methodology and the University of Maryland
Statistics Consortium.
The first talk is by Alastair Scott, titled
"The Design and Analysis of Retrospective Health Surveys." The second, titled
"The Interplay Between Sample Survey Theory and Practice: An Appraisal," is by
J. N. K. Rao. Click here for additional details about the speakers and talks.

Dr. Scott's talk will begin at 1:00 pm and will be discussed by Barry Graubard
from the National Cancer Institute and Graham Kalton from Westat and JPSM.

Dr. Rao's talk will begin at 3:00 pm and will be discussed by Phil Kott from the
National Agricultural Statistical Service and Mike Brick from Westat and JPSM.

Both talks will be held in 2205 LeFrak Hall.
There will be a reception immediately afterwards at 4:45.



SPEAKER: Professor Michael Cummings
Center for Bioinformatics and Computational Biology, UMCP

TITLE: Analysis of Genotype-Phenotype Relationships: Machine Learning/Statistical Methods

TIME AND PLACE:  Thurs., December 8, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: Understanding the relationship of genotype to phenotype is a fundamental problem in modern genetics research. However, significant analytical challenges exist in the study of genotype-phenotype relationships. These challenges include genotype data in the form of unordered categorical values (e.g., nucleotides, amino acids, SNPs), numerous levels of variables, mixture of variable types (categorical and numerical), and potential for non-additive interactions between variables (epistasis). These challenges can be dealt with through use of machine learning/statistical approaches such as tree-based statistical models and random forests. These methods recursively partition a data set in two (binary split) based on values of a single predictor variable to best achieve homogeneous subsets of a categorical response variable (classification) or to best separate low and high values of a continuous response variable (regression). These methods are very well suited for the analysis of genotype-phenotype relationships and have been shown to provide outstanding results. Examples to be presented include identifying amino acids important in spectral tuning in color vision and nucleotide sequence changes important in some growth characteristics in maize.


SPEAKER: Dr. Myron Katzoff
National Center for Health Statistics/ Centers for Disease Control

TITLE: Statistical Methods for Decontamination Sampling

TIME AND PLACE:  Thurs., December 15, 2005, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: This talk will be about an adaptive sampling procedure applicable to microparticle removal and a methodology for validating a computational fluid dynamics (CFD) model which it is believed will be useful in refining such a procedure. The adaptive sampling procedure has many features in common with current field practices; its importance is that it would enable valid statistical inferences. The methodology for CFD model validation which is described employs statistical techniques used in the frequency domain analysis of spatio-temporal data. Seminar attendees will be encouraged to contribute their thoughts on alternative proposals for analyses of experimental data for CFD model validation.

Slides from the talk can be viewed here .

Spring 2006 Talks


SPEAKER: Dr. Mokshay Madiman
Statistics Department, Yale

TITLE: Statistical Data Compression with Distortion

TIME AND PLACE:  Tues., January 31, 2006, 3:30pm    Note unusual day !
          Room 1313, Math Bldg

ABSTRACT: Motivated by the powerful and fruitful connection between information- theoretic ideas and statistical model selection, we consider the problem of "lossy" data compression ("lossy" meaning that a certain amount of distortion is allowed in the decompressed data) as a statistical problem. After recalling the classical information-theoretic development of Rissanen's celebrated Minimum Description Length (MDL) principle for model selection, we introduce and develop a new theoretical framework for _code selection_ in data compression. First we describe a precise correspondence between compression algorithms (or codes) and probability distributions, and use it to interpret arbitrary families of codes as statistical models. We then introduce "lossy" versions of several familiar statistical notions (such as maximum likelihood estimation and MDL model selection criteria), and we propose new principles for building good codes. In particular, we show that in particular cases, our "lossy MDL estimator'" has the following optimality property: Not only it converges to the best available code (as the amount of data grows), but it also identifies the right class of codes in finite time with probability one.

[Joint work with Ioannis Kontoyiannis and Matthew Harrison.]

This talk is by Invitation of the Hiring Committee.


SPEAKER: Lang Withers
MITRE Signal Processing Center

TITLE: The Bernoulli-trials Distribution and Wavelet
     This talk is jointly sponsored with the Harmonic Analysis Seminar this week.


TIME AND PLACE:  Thurs., February 2, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: This talk is about a probability distribution function for Bernoulli ("coin-toss") sequences. We use the Haar wavelet to analyze it, and find that this function just maps binary numbers in [0,1] into general p-binary numbers in [0,1]. Next we see that this function obeys a two-scale dilation equation and use it to construct a family of wavelets. This family contains the Haar wavelet and the piecewise-linear wavelet as special cases. What is striking here is how naturally probability and wavelets interact: the Haar wavelet sheds light on the meaning of a distribution; the distribution happens to obey a two-scale dilation equation and lets us make it into a wavelet.

We take up the more general case of the distribution function for multi-valued Bernoulli trials. A special case of this for three-valued trials is the Cantor function. Again we find that it just maps ternary numbers into generalized ternary numbers. I hope to develop the Cantor wavelet as well in time for the talk.

Audience: advanced undergrad and up; some familiarity with wavelets and measure theory is helpful.

Click here to see a current draft of the speaker's paper on the subject of the talk.


SPEAKER: Hyejin Shin
Department of Statistics, Texas A&M University

TITLE: An RKHS Formulation of Discrimination and Classification for Stochastic Processes

TIME AND PLACE:  Thurs., February 9, 2006, 12:30-1:45pm
          Room 3206, Math Bldg

Note unusual time and place for this seminar !

ABSTRACT:   Modern data collection methods are now frequently returning observations that should be viewed as the result of digitized recording or sampling from stochastic processes rather than vectors of finite length. In spite of great demands, only a few classification methodologies for such data have been suggested and supporting theory is quite limited. Our focus is on discrimination and classification in the infinite dimensional setting. The methodology and theory we develop are based on the abstract canonical correlation concept in Eubank and Hsing (2005) and motivated by the fact that Fisher's discriminant analysis method is intimately tied to canonical correlation analysis. Specially, we have developed a theoretical framework for discrimination and classification of sample paths from stochastic processes through use of the Lo`eve-Parzen isometric mapping that connects a second order process to the reproducing kernel Hilbert space generated by its covariance kernel. This approach provides a seamless transition between finite and infinite dimensional settings and lends itself well to computation via smoothing and regularization.

This talk is by Invitation of the Mathematics Department Hiring Committee.


SPEAKER: Professor Jae-Kwang Kim
Dept. of Applied Statistics, Yonsei University, Korea

TITLE: Regression fractional hot deck imputation

TIME AND PLACE:  Thurs., February 16, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   Imputation using a regression model is a method to preserve the correlation among variables and to provide imputed point estimators. We discuss the implementation of regression imputation using fractional imputation. By a suitable choice of fractional weights, the fractional regression imputation can take the form of hot deck fractional imputation, thus no artificial values are constructed after the imputation. A variance estimator, which extends the method of Kim and Fuller (2004, Biometrika), is also proposed. By a suitable choice of imputation cells, the proposed estimators can be made robust against the failure of the assumed regression imputation model. Comparisons based on simulations are presented.

Professor Kim has made the slides for his talk available here .


SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department

TITLE: Model selection and inference in regression when the number
of explanatory variables is of the same order as sample size.


TIME AND PLACE:  Thurs., February 23, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   Some of the most challenging problems in modern econometrics and statistics feature a large number of possibly important factors or variables, and a comparatively small sample size. Examples include portfolio selection, detection of fraudulent customers of credit card or telephone companies, micro-array analysis, or proteomics.

I consider one problem of that kind: Regression with random design, where the number of explanatory variables is of the same order as sample size. The focus is on selecting a model with small predictive risk.

Traditional model selection procedures, including AIC, BIC, FPE or MDL, perform poorly in this setting. The models selected by these procedures can by anything from mildly suboptimal to completely unreasonable, depending on unknown parameters. In addition, inference procedures based on the selected model, like tests or confidence sets, are invalid, irrespective of whether a good model has been chosen or not.

I propose a new approach to the model selection problem in this setting that explicitly acknowledges the fact that the number of explanatory variables is of the same order as sample size. This approach has several attractive features:

1) It will select the best predictive model asymptotically, irrespective of unknown parameters (under minimal conditions).

2) It allows for inference procedures like tests or confidence sets based on the selected model that are asymptotically valid.

3) Simulations suggest that the asymptotics in 1 and 2 above `kick in' pretty soon, e.g., in a problem with 1000 parameters and 1600 observations.

These results are currently work in progress.

Professor Leeb will also give a second, more general talk for the campus statistical
community which is jointly sponsored by the Stat Program in the Math Department
along with the campus Statistics Consortium. Details for the second talk are as follows:


SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department

TITLE: Model Selection and Inference: Facts and Fiction

TIME AND PLACE:  Friday., February 24, 2006, 3:00pm
          Lefrak Building Room 2205

ABSTRACT:   Model selection has an important impact on subsequent inference. Ignoring the model selection step leads to invalid inference. We discuss some intricate aspects of data-driven model selection that do not seem to have been widely appreciated in the literature. We debunk some myths about model selection, in particular the myth that consistent model selection has no effect on subsequent inference asymptotically. We also discuss an `impossibility' result regarding the estimation of the finite-sample distribution of post-model-selection estimators.

A paper of Professor Leeb covering most of the issues in the second talk can be found here.

This talk is jointly sponsored by the Statistics Consortium and the Statistics Program in the Mathematics Department. The talk will be followed by refreshments at 4:30pm.


SPEAKER: Guoxing (Greg) Soon, Ph.D.
Office of Biostatistics, CDER, Food & Drug Administration

TITLE: Statistical Applications in FDA

TIME AND PLACE:  Thurs., March 2, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   This talk will be divided into three parts. In the beginning I will briefly describe the kind of work the FDA statistician do, then I will discuss two topics, one is on "From Intermediate endpoint to final endpoint: a conditional power approach for accelerated approval and interim analysis", one is on "Computer Intensive and Re-randomization Tests in Clinical Trials".

1. Statistical Issues in FDA

Statistics plays an important role in the FDA's decision making process. Statistical inputs were critical for design, conduct, analysis and interpretation of clinical trials. The statistical issues we dealt with include, but not limited to the following: appropriateness of randomization procedure, determination of analysis population, blinding, potential design flaws that may lead to biases, quality of endpoint assessment, interim analysis, information handling, missing values, discontinuations, decision rule, analysis methods, and interpretation. In this talk I will describe the type of work we do with a few examples.

2. From Intermediate endpoint to final endpoint: a conditional power approach for accelerated approval and interim analysis

For chronic and life threatening diseases, the clinical trials required for final FDA approval may take a long time. It is therefore sometimes necessary to approve the drug temporarily (accelerated approval) based on early surrogate endpoints. Traditionally such approvals were based on similar requirements on the surrogate endpoints as if it is final endpoint, regardless of the quality of the surrogacy. However, in this case the longer term information on some patients is ignored, and the risk for the eventual failure on the final approval is not being considered.

In contrast, in typical group sequential trials, only information on the final endpoint on a fraction of patients are used, and short-term endpoints on other patients are being ignored. This reduces the efficiency of inferences and will also fail to account for potential shift of population over the course of the trial.

In this talk I will propose an approach that utilizes both short-term surrogate and long-term final endpoint at interim or intermediate analyses, and the decision for terminating trial early, or granting temporary approval, will be based on the likelihood of seeing a successful trial were the trial to be completed. Issues on Type I error control as well as efficiency of the procedure will be discussed.

3. Computer Intensive and Re-randomization Tests in Clinical Trials

Quite often clinicians are concerned about balancing important covariates at baseline. Allocation methods designed to achieve deliberate balance on baseline covariates, commonly called dynamic allocation or minimization, were used for this purpose. This non-standard allocation poses challenge for the common statistical analysis. In this talk I will examine robustness of level and power of common tests with deliberately balanced assignments when assumed distribution of responses is not correct.

There are two methods of testing with such allocations: computer intensive and model based. I will review some of the common mistaken attitudes about the goals of randomization. And I will discuss some simulations that attempt to explore the operating characteristics of re-randomization and model based analyses when model assumptions are violated.

Click here to see the slides for Dr. Soon's talk.


SPEAKER: Professor Lee K. Jones
Department of Mathematical Sciences, University of Massachusetts Lowell

TITLE: On local minimax estimation with some consequences for
ridge regression, tree learning and reproducing kernel methods

     This talk is jointly sponsored with the Harmonic Analysis Seminar this week.


TIME AND PLACE:  Thurs., March 9, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   Local learning is the process of determining the value of an unknown function at only one fixed query point based on information about the values of the function at other points. We propose an optimal methodology ( local minimax estimation) for local learning of functions with band-limited ranges which differs from (and is demonstrated in many interesting cases to be superior to) several popular local and global learning methods. In this theory the objective is to minimize the (maximum) prediction error at the query point only - rather than minimize some average performance over the entire domain of the function. Since different compute-intensive procedures are required for each different query, local learning algorithms have only recently become feasible due to the advances in computer availability, capability and parallelizability of the last two decades.

In this talk we first apply local minimax estimation to linear functions. A rotationally invariant approach yields ridge regression, the ridge parameter and optimal finite sample error bounds. A scale invariant approach similarly yields best error bounds but is fundamentally different from either ridge or lasso regression. The error bounds are given in a general form which is valid for approximately linear target functions.

Using these bounds an optimal local aggregate estimator is derived from the trees in a Breiman (random) forest or a deterministic forest. Finding the estimator requires the solution to a challenging large dimensional non-differentiable convex optimization problem. Some approximate solutions to the forest optimization are given for classification using micro-array data.

Finally the theory is applied to reproducing kernel Hilbert space and an improved Tikhonov estimator for probability of correct classification is presented along with a proposal for local determination of optimal kernel shape without cross validation.

To see a copy of the paper on which the talk is based, click here .


SPEAKER: Professor Reza Modarres
George Washington University, Department of Statistics

TITLE: Upper Level Set Scan Statistic for Detection of Disease and Crime Hotspots

TIME AND PLACE:  Thurs., March 16, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   The upper level set (ULS) scan statistic, its theory, implementation, and extens ion to the bivariate data are discussed. The ULS-Hotspot algorithm that obtains the response rates, maintains a list of connected components at each level of th e rate function and yields the ULS tree is described. The tree is grown in the immediate successor list, which provides a computationally efficient method for likelihood evaluation, visualization and storage. An example shows how the zones are formed and the likelihood function is developed for each candidate zone. Bivariate hotspot detection is discussed, including the bivariate binomial model, the multivariate exceedance approach, and the bivariate Poisson distribution. The Intersection method is recommended as it is simple to implement, using univariate hotspot detection methods. Applications to mapping of crime hotspots and disease clusters are presented.

Joint work with G.P. Patil.


SPEAKER: Professor Robert Mislevy
Department of Educational Measurement & Statistics (EDMS), UMCP

TITLE: A Bayesian perspective on structured mixtures of IRT models:
Interplay among psychology, evidentiary arguments, probability-based reasoning


TIME AND PLACE:  Thurs., March 30, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   (Joint paper with Roy Levy, Marc Kroopnick, and Daisy Wise, all of EDMS.)

Structured mixtures of item response theory (IRT) models are used in educational assessment for so-called cognitive diagnosis, that is, supporting inferences ab out the knowledge, procedures, and strategies students use to solve problems. Th ese models arise from developments in cognitive psychology, task design, and psy chometric models. We trace their evolution from the perspective of Bayesian inf erence, highlighting the interplay among scientific modeling, evidentiary argument, and probability-based reasoning about uncertainty.

This work draws in part on the first author's contributions to the National Research Council's (2002) monograph, available online :
Knowing what students know, J. Pellegrino, N. Chudowsky, & R. Glaser (Eds.), Washington, D.C.: National Academy Press.


On Friday, April 7, 2006, JPSM is sponsoring a Distinguished Lecture:

SPEAKER: Nora Cate Schaeffer

TITLE: Conversational Practices with a Purpose:
Interaction within the Standardized Interview


TIME AND PLACE:  Friday, April 7, 2006, 3:30pm
          Room 2205 Lefrak Hall

There will be a reception immediately afterwards.

ABSTRACT: The lecture will discuss interactions in survey interviews and standardization as it is actually pacticed. An early view of the survey interview characterized it as a "conversation with a purpose," and this view was later echoed in the description of survey interviews as "conversations at random." In contrast to these informal characterizations of the survey interview, stand the formal rules and constraints of standardization as they have developed over several decades. Someplace in between a "conversation with a purpose" and a perfectly implemented standardized interview are the actual practices of interviewers and respondents as they go about their tasks. Most examinations of interaction in the survey interview have used standardization as a starting point and focused on how successfully standardization has been implemented, for example by examining whether interviewers read questions as worded. However, as researchers have looked more closely at what interviewers and respondents do, they have described how the participants import into the survey interview conversational practices learned in other contexts. As such observations have accumulated, they provide a vehicle for considering how conversational practices might support or undermine the goals of measurement within the survey interview. Our examination of recorded interviews from the Wisconsin Longitudinal Study provides a set of observations to use in discussing the relationship among interactional practices, standardization, and measurement.


SPEAKER: Prof. Jiuzhou Song
Department of Animal Sciences, UMCP

TITLE: The Systematic Analysis for Temporal Gene Expression Analysis

TIME AND PLACE:  Thurs., April 13, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   In temporal gene expression analysis, we propose a strategy to explore the use of gene and treatment effect information, and build synthetic genetic network. Assuming that variations of gene expression are caused by different conditions, we classified all experimental conditions into several subgroups via clustering analysis which groups conditions based on the similarity of temporal gene expression profiles, this procedure is useful because it allows us to combine more diverse gene expression data sets as they become available, by setting a reference gene we described makes the genetic regulatory networks laid on a concrete biological foundation. We also visualized the gene activation process via starting point and ending point, and combined all of the information to describe genetic regulatory relationships and obtain consensus gene activation order. The estimation of activation points and building of synthetic genetic network may result in important new insights in ongoing endeavor to understand the complex network of gene regulations.


On Thursday, April 20, 2006, 4:15-6:45pm, there will be a Statistics Consortium
Sponsored Statistics Day event, involving a Distinguished Lecture and a
Discussion at Physics Building Room 1410.

DISTINGUISHED SPEAKER: Professor Peter Bickel
Statistics Department, University of California, Berkeley

TITLE: Using Comparative Genomics to Assess the Function of Noncoding Sequences

TIME AND PLACE:  Thursday, April 20, 2006, 4:15-6:00 pm
         
Room 1410, Physics Building

ABSTRACT:   We have studied 2094 NCS of length 150-200bp from Edward Rubin's laboratory. These sequences are conserved at high homology between human, mouse, and fugu. Given the degree of homology with fugu, it seems plausible that all or part of most of these sequences is functional and, in fact, there is already some experimental validation of this conjecture. Our goal is to construct predictors of regulation (or potential irrelevance) by the NCS of nearby genes and further using binding sites and the transcription factors that bind to them to deduce some pathway information. One approach is to collect covariates such as features of nearest genes, physical clustering indices, etc, and use statistical methods to identify covariates, select among these for importance, relate these to each other and use them to create stochastic descriptions of the NCS which can be used for NCS clustering and NCS and gene function prediction singly and jointly. Of particular importance so far has been GO term annotation and tissue expression of downstream genes as well as the presence of blocks of binding sites known from TRANSFAC data base in some of the NCS. Our results so far are consistent with those of recent papers engaged in related explorations such as Woolfe et al (2004), Bejerano et al (2005) and others but also suggest new conclusions of biological interest.

DISCUSSANT:   Dr. Steven Salzberg
Director, Center for Bioinformatics and Computational Biology, and
Professor, Department of Computer Science, University of Maryland

The Lecture and Discussion will be followed by a reception (6:00-6:45pm)
in the Rotunda of the Mathematics Building.


SPEAKER: Dr. Neal Jeffries
National Institute of Neurological Diseases and Stroke

TITLE: Multiple Comparisons Distortions of Parameter Estimates

TIME AND PLACE:  Thurs., April 27, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   In experiments involving many variables investigators typically use multiple comparisons procedures to determine differences that are unlikely to be the result of chance. However, investigators rarely consider how the magnitude of the greatest observed effect sizes may have been subject to bias resulting from multiple testing. These questions of bias become important to the extent investigators focus on the magnitude of the observed effects. As an example, such bias can lead to problems in attempting to validate results if a biased effect size is used to power a follow-up study. Further, such factors may give rise to conflicting findings in comparing two independent samples -- e.g. the variables with strongest effects in one study may predictably appear much less so in a second study. An associated important consequence is that confidence intervals constructed using standard distributions may be badly biased. A bootstrap approach is used to estimate and correct the bias in the effect sizes of those variables showing strongest differences. This bias is not always present; some principles showing what factors may lead to greater bias are given and a proof of the convergence of the bootstrap distribution is provided.

Key words: Effect size, bootstrap, multiple comparisons


SPEAKER: Professor Bing Li
Department of Statistics, Penn State University

TITLE: A Method for Sufficient Dimension Reduction in Large-p-Small-n Regressions

TIME AND PLACE:  Thurs., May 4, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   Large-p-small-n data, in which the number of recorded variables (p) exceeds the number of independent observational units (n), are becoming the norm in a variety of scientific fields. Sufficient dimension reduction provides a meaningful and theoretically motivated way to handle large-p-small-n regressions, by restricting attention to d < n linear combinations of the original p predictors. However, standard sufficient dimension reduction techniques are themselves designed to work for n > p, because they rely on the inversion of the predictor sample covariance matrix. In this article we propose an iterative method that eliminates the need for such inversion, using instead powers of the covariance matrix. We illustrate our method with a genomics application; the discrimination of human regulatory elements from a background of ``non-functional" DNA, based on their alignment patterns with the genomes of other mammalian species. We also investigate the performance of the iterative method by simulation, obtaining excellent results when n < p or $n \approx p$. We speculate that powers of the covariance matrix may allow us to effectively exploit available information on the predictor structure in identifying directions relevant to the regression.


SPEAKER: Professor Biao Zhang
Mathematics Department, University of Toledo

TITLE: Semiparametric ROC Curve Analysis under Density Ratio Models

TIME AND PLACE:  Thurs., May 11, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT:   Receiver operating characteristic (ROC) curves are commonly used to measure the accuracy of diagnostic tests in discriminating disease and nondisease. In this talk, we discuss semiparametric statistical inferences for ROC curves under a density ratio model for disease and nondisease densities. This model has a natural connection to the logistic regression model. We explore semiparametric inference procedures for the area under the ROC curve (AUC), semiparametric kernel estimation of the ROC curve and its AUC, and comparison of the accuracy of two diagnostic tests. We demonstrate that statistical inferences based on a semiparametric density ratio model are more robust than a fully parametric approach and are more efficient than a fully nonparametric approach.

Fall 2006 Talks


SPEAKER: Prof. Eric Slud
Mathematics Department, UMCP

TITLE: "General position" results on uniqueness of optimal nonrandomized Group-sequential decision procedures in Clinical Trials

TIME AND PLACE:  Thurs., Oct. 26, 2006, 3:30pm
          Room 1313, Math Bldg

ABSTRACT: This talk will first give some background on group- or batch-sequential hypothesis tests for treatment effectiveness in two-group clinical trials. Such tests are based on a test statistic like the logrank, repeatedly calculated at a finite number of "interim looks" at the developing clinical trial survival data, where the timing of each look can in principle depend on all previously available data. The focus of this talk will be on a decision-theoretic formulation of the problem of designing such trials, when, as is true in large trials, the data can be viewed as observations of a Brownian motion with drift, and the drift parameter quantifies the difference in survival distributions between the treatment and control groups. The new results presented in the talk concern existence and uniqueness of nonrandomized optimal designs, subject to constraints on type I and II error probability, under fairly general loss functions when the cost functions are slightly perturbed, randomly, as functions of time. The proof techniques are related to old results on level-crossings for continuous time random processes.

This work is joint with Eric Leifer, a UMCP PhD of several years ago now at the Heart, Lung and Blood Institute at NIH.

To see a copy of the slides for the talk, click here .