|
Statistics Seminars from
Previous Terms
Fall 2005 Seminars
Spring 2006 Seminars
Fall 2006 Seminars
Stat Consortium Lectures from 2007
FALL 2007 SEMINAR TALKS:
SPEAKER: Professor Abram Kagan
Mathematics Department, Statistics Program, UMCP
TITLE: Bivariate distributions with arbitrary
marginals and Gaussian-like dependence structure
TIME AND PLACE:
Thurs., September 20, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Some results will be presented which are obtained
as by-products in attempts to describe random vectors
X = (X1, ..., Xm,
Xm+1, ..., Xn) possessing the
following property: any pair of uncorrelated linear combinations
L1 = a1 X1 + ... +
am Xm and L2 =
am+1 Xm+1 + ... + an
Xn are independent.
Since L1 and L2 involve
disjoint sets of the components of X , the above
condition imposes no constraint on the marginal distributions of
(X1, ..., Xm) and ( Xm+1, ...,
Xn) but affects only the dependence structure between the
groups.
SPEAKER: Dr. Nadarajasundaram Ganesh
USDA National Agricultural Statistics Service, on ASA/USDA
Fellowship, recently graduated from
Statistics Program, UMCP
TITLE: Spatial Modeling and Prediction of
County-level Employment-growth Data
TIME AND PLACE:
Thurs., September 27, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: For spatially correlated data we propose a
linear model with covariance matrix in which observations are
grouped into blocks by a similarity measure based on spatial locations and
covariates. We briefly give an overview of asymptotics for spatial
data, and discuss the proposed asymptotic framework; our approach to
"blocking"; estimation methods; computational experiences; and
parameter combinations for which prediction using can be shown to
improve over predictors that ignore correlations between residuals.
The proposed model is implemented for estimation and prediction
within a county-level employment growth-rate data set.
To see the slides for the talk, click here.
SPEAKER: Professors Douglas Oard1 and
Philip Resnik2
1Associate Dean for Research & Associate Professor,
College for Information Studies, UMCP 2Associate
Professor, Linguistics Dept. & UMIACS, UMCP
TITLE: Two for the Price of One:
Statistics in Natural Language Processing
and Information Retrieval
TIME AND PLACE:
Thurs., October 4, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Interesting problems in statistics arise in
several areas of natural language processing and information
retrieval. Broadly, we might divide these into (1) estimating useful
distributions for language use and (2) designing insightful and
affordable evaluation methods. In this talk, we will provide a broad
overview of these two closely related fields, focusing
first on the consequences of what has been called the "evaluation
guided research paradigm" that now dominates both fields. We'll
then drill down to each describe one or two problems from our recent
work where it seems to us that our worlds and yours [the
statisticans'] might intersect. Our goal in this seminar is to start
a discussion about the kinds of problems we
might productively work on together.
NO SEMINAR: Thurs., October 11, 2007.
SPEAKER: Professor Radu Balan
Mathematics Department and CSCAMM, UMCP
TITLE: Sparse Component Analysis: Use of
Statistical Methods and Sparse Signal
Representations in Convolutive
Blind Source Separation Problems
This talk is jointly
sponsored by the Statistics Seminar and the Norbert Wiener
Center.
TIME AND PLACE:
Thurs., October 18, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Sparse Component Analysis represents an overlap of
two problems (and methods) of Statistics/
Computer Science/Electrical Engineering/Applied Mathematics:
Independent Component Analysis (ICA), and Sparse Representations.
Originally, the ICA problem is looking for decomposing a random
d-vector into a linear composition of exactly d independent random
variables: x = A s , where A is a dxd unknown mixing matrix, and s is
the d-vector of independent components. The Blind Source Separation
(BSS) problem is very similar to ICA, except that A may be a matrix of
(convolutive) operators. In practice, people applied these solutions
to different type of signals. In particular audio (speech) signals
gave rise to what is also known as "the cocktail party
problem". Interesting algorithms were also obtained on images,
bio-medical signals (e.g. EEG, ERP, fMRI). Independent of this, the
Sparse Representation problem tries to decompose a vector x into a
linear combination of (possibly redundant) frame vectors using a
smallest number of coefficients. My talk uses sparse representation
hypotheses in order to solve a convolutive BSS, including estimating
the number of source signals. Time permitting, I would also like to
comment on a standard result in ICA that says that x = A s can be
identified only if at most two independent components of 's' are
Gaussian.
SPEAKER: Professor Guangyu Zhang
Department of Epidemiology and Biostatistics, UMCP
TITLE: The Penalized Spline of Propensity
Prediction Method of Imputation
TIME AND PLACE:
Thurs., October 25, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Missing data problems are very common for
statistical research. Many methods have been proposed to deal with
missing information. One method is to impute missing information based
on the observed data. This approach yields a N!NHcompleteN!NI data set for
further statistical analysis. In the first part of the presentation I
present a robust imputation model, the Penalized Spline Propensity
Prediction (PSPP) model, originally proposed by Little and An (2004)
and then simplified by Zhang and Little (2005). The propensity score
for a missing variable is estimated and a regression model is fit that
includes the spline of the propensity score. The predicted
unconditional mean of the missing variable has a double robustness
(DR) property under misspecification of the imputation model. The DR
property can also be achieved by modeling the relationship
parametrically. One method is to include the inverse of the propensity
score as a linear term in the imputation model (Firth and Bennett,
1998; Bang and Robins, 2005). Another approach is to calibrate the
predictions from a parametric model by adding means of the weighted
residuals, with weights equal to inverse of the propensity scores
(Robins, Rotnitzky and Zhao, 1994; Scharfstein, Rotnitzky and Robins,
1999). In the second part of my talk, I compare the PSPP method with
these methods by simulation. In the third part, I present several
extensions of the PSPP method, namely stratified PSPP and bivariate
PSPP for conditional means of a missing variable given a covariate,
and stepwise PSPP for monotone patterns of missing data.
SPEAKER: Professor Bill Fagan
Biology Department, UMCP
TITLE: Comparative Evolutionary Ecology of
Mammals: the Role of Statistics in
Understanding Population Growth Rates and Movement
TIME AND PLACE:
Thurs., November 1, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: still to come.
SPEAKER: Professor Hanno Petras
Criminal Justice & Criminology Department, UMCP
TITLE: Specialization in Juvenile Offending -
An Application of Latent Transition Analysis
TIME AND PLACE:
Thurs., November 8, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Offender specialization is one of the
long-standing themes in theoretical and empirical criminology. It
aims at identifying specific groups of individuals who disproportionately
commit specific acts. Researchers have argued that knowledge about the
early offense process will assist in later prediction and might be
utilized in criminal justice decision making by evaluating the impact
of legal sanctions and other interventions. Criminological research
has shown consistent but weak evidence for specialization. However,
studies of offending specialization entail methodological assumptions
about how episodes of offending should be conceptualized and
classified. In this presentation, we will explore the utility of a
latent variable approach (i.e., Latent Class Transition Analysis
(LTA)) to investigate the specialization hypothesis. LTA is a tool to
quantify who will change class membership across discrete stages in
time. In this study, the transitions across four discrete age periods
are investigated (e.g., age 6-12, 13-14, 15-16, 17-18). At each time
point, classes are determined by five distinct crime indicators
(Nonindex, Injury, Theft, Damage, and Combination). The strength of
this approach is the treatment of the event status as latent which
allows for the study of individual variation in transitional
probabilities and the explicit modeling of the age-crime
relationship. However, when using age to defining stages assuming
measurement invariance across these periods may be unrealistic, e.g.,
the commission of a homicide might be less likely at age 6-12 vs. age
17-18. Thus, the utility of alternative measurement models will be
explored. Data about 3475 youth from the Philadelphia Birth Cohort
study (Wolfgang et al, 1972) will be used. All of them were male and
were initially assessed at age 10 and followed through age 18. Of
those youth, 42% were non white and 59% originated from a lower
SES. Of the 3475 males, 53.58% recidivated and 46.42 were one time
offenders.
SPEAKER: Professor Tongtong Wu
Department of Epidemiology and Biostatistics, UMCP
TITLE: An MM Algorithm for Multicategory
Vertex Discriminant Analysis
TIME AND PLACE:
Thurs., November 15, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk introduces a new method of supervised
learning based on linear discrimination among the vertices of a
regular simplex in Euclidean space. Each vertex represents a different
category. Discrimination is phrased as a regression problem
involving &epsilon-insensitive residuals and a
quadratic penalty on the
coefficients of the linear predictors. The objective function can by
minimized by a primal MM (majorization- minimization) algorithm that
(a) relies on quadratic majorization and iteratively reweighted least
squares, (b) is simpler to program than algorithms that pass to the
dual of the original optimization problem, and (c) can be accelerated
by step doubling. Limited comparisons on real and simulated data
suggest that the MM algorithm is competitive in statistical accuracy
and computational speed with the best currently available algorithms
for discriminant analysis.
SPEAKER: Dr. Ruth Pfeiffer
Senior Investigator, Biostatistics Branch, Division of Cancer
Epidemiology & Genetics,
National Cancer Institute
TITLE: Probability of Detecting
Disease-Associated SNPs in Genome-Wide Association Studies
TIME AND PLACE:
Thurs., November 29, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT:Some case-control genome-wide association studies
(GWASs) select promising single nucleotide polymorphisms (SNPs) by
ranking corresponding p-values, rather than by applying the same
p-value threshold to each SNP. For such a study, we define the
detection probability (DP) for a specific disease-associated SNP as
the probability that the SNP will be `T-selected', namely have one
of the top T largest chi-square values for trend tests of
association. The corresponding proportion positive (PP) is the
fraction of selected SNPs that are true disease-associated SNPs. We
study DP and PP analytically and via simulations, for fixed and random
effects models of genetic risk. DP increases with genetic effect size
and case-control sample size, and decreases with the number of
non-disease SNPs, mainly through the ratio of T to N, the total number
of SNPs. We show that DP increases very slowly with T, and the
increment in DP per unit increase in T declines rapidly with T. DP is
also diminished if the number of true disease SNPs exceeds T. For a
genetic odds ratio per minor allele of 1.2 or less, even GWAS with
1000 cases and 1000 controls require T to be impractically large to
achieve an acceptable DP, leading to PP values so low as to make such
studies futile.
We extend these results to two-stage GWASs; a relatively small
proportion of the samples is allocated to a first stage where a large
number of SNPs is analyzed; the most promising SNPs are followed up in
a second stage in a larger set of samples. Investigators hope to
compensate for the relatively small first stage by selecting a large
number of SNPs for further study at the end of the first stage. We
show that such study designs can have substantially lower DP than a
one-stage design with the same numbers of cases and controls.
To see a complete set of slides including references from the
talk, click here.
SPEAKER: Dr. Leonid Kopylev
National Center for Environmental Assessment, EPA
TITLE: Some New Aspects of Dose-Response
Models with Applications to Multistage Models Having Parameters
on the Boundary
TIME AND PLACE:
Thurs., December 6, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk discusses statistical inference
based primarily on work by Self and Liang (1987) dealing with the
asymptotic theory of maximum likelihood estimates and likelihood ratio
tests when some parameters may lie on their boundaries. The results
are widely applicable to models used in environmental risk analysis
such as the dose response models that US EPA applies to bioassay
data. Applications of the results to dose-response multistage models
serve as illustrations.
This work is joint with Bimal Sinha of UMBC and John Fox of EPA.
The slides can be viewed here.
SPRING 2007 SEMINAR TALKS:
SPEAKER: Professor Leonid Koralov
Mathematics Department, UMCP
TITLE: Averaging of Hamiltonian Flows with an
Ergodic Component
TIME AND PLACE:
Thurs., Feb. 8, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: We consider a process which consists of the fast
motion along the stream lines of an incompressible periodic vector
field perturbed by the white noise. Together with D. Dolgopyat we
showed that for almost all rotation numbers of the unperturbed flow,
the perturbed flow converges to an effective, "averaged" Markov
process.
SPEAKER: Professor Donald Martin
Mathematics Department, Howard University & Census Bureau
Stat. Resch. Div.
TITLE: Distributions of patterns and statistics
in higher-order Markovian sequences
TIME AND PLACE:
Thurs., Feb. 15, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: In this talk we discuss a method for computing
distributions associated with general patterns and statistics in
higher-order Markovian sequences. An auxiliary Markov chain is
associated with the original sequence and probabilities are computed
through the auxiliary chain, simplifying computations that are
intractable using combinatorial or other approaches. Three distinct
examples of computations are given: (1) sooner or later waiting time
distributions for collections of compound patterns that must occur
pattern-specific numbers of times, using either overlapping counting
or two types of non-overlapping counting; (2) the joint distribution
of the total number of successes in success runs of length at least ,
and the distance between the beginning of the first such success run
and the end of the last one; (3) the distribution of patterns in
underlying variables of a hidden Markov model. Applications to
missing and noisy data and to bioinformatics are given to illustrate
the usefulness of the computations.
SPEAKER:
Professor Alexander S. Cherny
Moscow State University
TITLE: Coherent Risk Measures
TIME AND PLACE:
Tues., Feb. 20, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: The notion of a coherent risk measure was
introduced by Artzner, Delbaen, Eber, and Heath in 1997 and by now
this theory has become a considerable and very rapidly evolving branch
of the modern mathematical finance.
The talk will be aimed at describing basic results of this theory,
including the basic representation theorem of Artzner, Delbaen, Eber,
and Heath as well as the characterization of law invariant risk
measures obtained by Kusuoka.
It will also include some recent results obtained by the author,
related to the strict diversification property and to the
characterization of dilatation monotone coherent risks.
SPEAKER: Dr. Siamak Sorooshyari
Lucent Technologies -- Bell Laboratories
TITLE: A Multivariate Statistical Approach
to Performance Analysis of Wireless Communication Systems
TIME AND PLACE:
Thurs., Mar. 1, 2007, 3:30pm
Room 1313, Math Bldg
NOTE: this seminar is
presented jointly with the Norbert Wiener Center.
ABSTRACT: The explosive growth of wireless communication
technologies has placed paramount importance on accurate performance
analysis of the fidelity of a service offered by a system to a
user. Unlike the channels of wireline systems, a wireless medium
subjects a user to time-varying detriments such as multipath fading,
cochannel interference, and thermal receiver noise. As a
countermeasure, structured redundancy in the form of diversity has
been instrumental in ensuring reliable wireless communication
characterized by a low bit error probability (BEP). In the performance
analysis of diversity systems the common assumption of uncorrelated
fading among distinct branches of system diversity tends to exaggerate
diversity gain resulting in an overly optimistic view of
performance. A limited number of works take into account the problem
of statistical dependence. This is primarily due to the mathematical
complication brought on by relaxing the unrealistic assumption of
independent fading among degrees of system diversity.
We present a multivariate statistical approach to the performance
analysis of wireless communication systems employing diversity. We
show how such a framework allows for the statistical modeling of the
correlated fading among the diversity branches of the system
users. Analytical results are derived for the performance of
maximal-ratio combining (MRC) over correlated Gaussian vector
channels. Generality is maintained by assuming arbitrary power users
and no specific form for the covariance matrices of the received faded
signals. The analysis and results are applicable to binary signaling
over a multiuser single-input multiple-output (SIMO) channel. In the
second half of the presentation, attention is given to the performance
analysis of a frequency diversity system known as multicarrier
code-division multiple-access (MC-CDMA). With the promising prospects
of MC-CDMA as a predominant wireless technology, analytical results
are presented for the performance of MC-CDMA in the presence of
correlated Rayleigh fading. In general, the empirical results
presented in our work show the effects of correlated fading to be
non-negligible, and most pronounced for lightly-loaded communication
systems.
SPEAKER: Professor Harry Tamvakis
Mathematics Department, UMCP
TITLE: The Dominance Order
TIME AND PLACE:
Thurs., Mar. 8, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: The dominance or majorization order has its
origins in the theory of inequalities, but actually appears
in many strikingly disparate areas of mathematics. We will
give a selection of results where this partial order appears,
going from inequalities to representations of the symmetric
group, families of vector bundles, orbits of nilpotent matrices,
and finally describe some recent links between them.
NOTE: The topic of this talk is related to the following
problem being studied in the RIT of Prof. Abram Kagan:
Consider a round robin tournament with n players (each plays with each
one game; the winner gets one point, the loser zero).
The outcome of the tournament is a set of n integers, a1 >=
a2 >= ... >= an where a1 is the
total score of the tournament winner(s), a2 the score of
the second-place finisher, etc. Not all such sets are possible outcomes
but all the possible outcomes can be described.
A number of interesting probability problems arise here. E.g., assume
that n players are equally strong, i. e., the probability that player i
beats player j is 1/2 for all i, j. The expected score of each player in
the tournament is (n-1)/2. But what is the expected score (or the
distribution of the score) of the winner(s)? At the moment the answer is
unknown even in the asymptotic formulation (i. e., for large n).
SPEAKER: Zhibiao Zhao
Staistics Department, University of Chicago
TITLE: Confidence Bands in Nonparametric
Time Series Regression
TIME AND PLACE:
Tues., March 27, 2007, 3:30pm
NOTE special seminar
time.
Room 1313, Math Bldg
Abstract: Nonparametric model validation under dependence
has been a difficult problem. Fan and Yao (Nonlinear Time Series:
Nonparametric and Parametric Methods, 2003, page 406) pointed out that
there have been virtually no theoretical development on nonparametric
model validations under dependence, despite the importance of
the latter problem since dependence is an intrinsic characteristic in
time series. In this talk, we consider nonparametric estimation and
inference of mean regression and volatility functions in non- linear
stochastic regression models. Simultaneous confidence bands are
constructed and the coverage probabilities are shown to be
asymptotically correct. The imposed dependence structure allows
applications in many nonlinear autoregressive processes and linear
processes, including both short-range dependent and long-range
dependent processes. The results are applied to the S&P 500 Index
data. Interestingly, the constructed simultaneous confidence bands
suggest that we can accept the two null hypotheses that the regression
function is linear and the volatility function is quadratic.
SPEAKER: Dr. Ram Tiwari
National Cancer Institute, NIH
TITLE: Two-sample problems in ranked set
sampling
TIME AND PLACE:
Thurs., March 29, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: In many practical problems, the variable of
interest is difficult/expensive to measure but the sampling units can
be easily ranked based on another related variable. For example, in
studies of obesity, the variable of interest may be the amount of body
fat, which is measured by Dual Energy X-Ray Absorptiometry --- a
costly procedure. The surrogate variable of body mass index is much
easier to work with. Ranked set sampling is a procedure of improving
the efficiency of an experiment whereby one selects certain sampling
units (based on their surrogate values) that are then measured on the
variable of interest. In this talk, we will first discuss some results
on two-sample problems based on ranked set samples. Several
nonparametric tests will be developed based on the vertical and
horizontal shift functions. It will be shown that the new methods are
more powerful compared to procedures based on simple random samples of
the same size.
When the measurement of surrogate variable is moderately expensive, in
the presence of a fixed total cost of sampling, one may resort to a
generalized sampling procedure called k-tuple ranked set sampling,
whereby k(>1) measurements are made on each ranked set. In the second
part of this talk, we will show how one can use such data to estimate
the underlying distribution function or the population mean. The
special case of extreme ranked set sample, where data consists of
multiple copies of maxima and minima will be discussed in detail due
to its practical importance. Finally, we will briefly discuss the
effect of incorrect ranking and provide an illustration using data on
conifer trees.
SPEAKER: Guanhua Lu
Statistics Program, UMCP
TITLE: Asymptotic Theory in Multiple-Sample
Semiparametric Density Ratio Models
TIME AND PLACE:
Thurs., April 5, 2007, 3:30pm
Room 1313, Math Bldg
Abstract:
A multiple-sample semiparametric density ratio model can be constructed
by multiplicative exponential distortions of the reference distribution.
Distortion functions are assumed to be nonnegative and of a known
finite-dimensional parametric form, and the reference distribution is left
nonparametric. The combined data from all the samples are used in the
semiparametric large sample problem of estimating each distortion and the
reference distribution. The large sample behavior for both the parameters
and the unknown reference distribution are studied. The estimated
reference distribution has been proved to converge weakly to a zero-mean
Gaussian process.
SPEAKER: Dr. Gabor Szekely
NSF and Bowling Green State University
TITLE: Measuring and Testing Dependence by
Correlation of Distances
TIME AND PLACE:
Thurs., April 12, 2007, 3:30pm
Room 1313, Math Bldg
Abstract:
We introduce a simple new measure of dependence between random
vectors. Distance covariance (dCov) and distance correlation(dCor) are
analogous to product-moment covariance and correlation, but unlike the
classical definition of correlation, dCor = 0 characterizes independence
for the general case. The empirical dCov and dCor are based on certain
Euclidean distances between sample elements rather than sample moments,
yet have a compact representation analogous to the classical covariance
and correlation. Definitions can be extended to metric-space-valued
observations where the random vectors could even be in different metric
spaces. Asymptotic properties and applications in testing independence
will also be discussed. A new universally consistent test of
multivariate independence is developed. Distance correlation can also be
applied to prove CLT for strongly stationary sequences.
Distinguished JPSM Lecture
co-Sponsored by Statistics Consortium
SPEAKER: Professor Roderick J. Little
Departments of Biostatistics and Statistics and Institute for
Social Research, University of Michigan
TITLE: Wait! Should We Use the Survey Weights
to Weight?
TIME AND PLACE:
Friday, April 13, 2007, 3:30pm
Room 2205, Lefrak Hall
Two discussants will speak following Professor Little's talk:
John Eltinge of Bureau of Labor Statistics and Richard Valliant from
JPSM.
SPEAKER: Dr. Song Yang
Office of Biostatistics Research, National Heart Lung and Blood
Institute, NIH
TITLE: Some versatile tests of treatment
effect using adaptively weighted log rank statistics
TIME AND PLACE:
Thurs., April 19, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: For testing treatment effect with time to event
data, the log rank test is the most popular choice and is optimal for
proportional hazards alternatives. When a range of possibly
nonproportional alternatives are possible, combinations of several
tests are often used. Currently available methods inevitably
sacrifice power at proportional alternatives and may also be
computationally demanding. We introduce some versatile tests that use
adaptively weighted log rank statistics. Extensive numerical studies
show that these new tests almost uniformly improve the tests that they
modify, and are optimal or nearly so for proportional alternatives.
In particular, one of the new tests maintains optimality at the
proportional alternatives and also has very good power at a wide range
of nonproportional alternatives, thus is the test we recommend when
flexibility in the treatment effect is desired. The adaptive weights
are based on the model of Yang and Prentice (2005).
Statistics Consortium Lecture
co-Sponsored by JPSM and MPRC
SPEAKER: Professor Bruce Spencer
Statistics Department & Faculty Fellow, Institute for Policy
Research, Northwestern University
TITLE: Statistical Prediction of Demographic
Forecast Accuracy
TIME AND PLACE:
Friday, April 27, 2007, 3:15pm
Room 2205, Lefrak Hall
ABSTRACT: Anticipation of future population change affects
public policy deliberations on (i) investment for health care and pensions,
(ii) effects of immigration policy on the economy, (iii) future
competitiveness of the U.S. economy, to name just three. In this
talk, we review some statistical approaches used to predict the
accuracy of demographic forecasts and functional forecasts underlying
the policy discussions. A functional population forecast is
one that is a function of the population vector as well as other
components, for example a forecast of the future balance of a pension
fund. No background in demography will be assumed, and the necessary
demographic concepts will be introduced from the statistical point of
view. The talk is based on material in Statistical Demography and
Forecasting by J. M. Alho and B. D. Spencer (2005, Springer) and
reflects joint work by the authors.
Following Professor Spencer's talk, there will be a formal
Discussion, by Dr. Peter Johnson of the International Programs Center
of the Census Bureau and Dr. Jeffrey Passel of the Pew Hispanic
Center. Following the formal and floor discussion, there
will be a reception including refreshments.
SPEAKER: Professor Dennis Healy
Mathematics Department, UMCP
TITLE: TBA
TIME AND PLACE: Postponed
FALL 2005 SEMINAR TALKS:
SPEAKER: Prof. Ross Pinsky
Mathematics Department, Technion, Israel
TITLE: Law of Large Numbers for Increasing
Subsequences of Random Permutations
TIME AND PLACE:
Tues., August 23, 2005, 2pm
Room 1313, Math Bldg
ABSTRACT: click here.
SPEAKER: Prof. Paul Smith
Statistics Program, Mathematics Department, UMCP
TITLE: Statistical Analysis of Ultrasound
Images of Tongue Contours
during Speech
TIME AND PLACE:
Thurs., September 15, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: The shape and movement of the tongue are critical
in the formation of human speech. Modern imaging techniques allow
scientists to study tongue shape and movement without interfering with
speech. This presentation describes statistical isssues arising from
ultrasound imaging of tongue contour data.
There are many sources of variability in tongue image data,
including speaker to speaker differences, intraspeaker differences, noise
in the images, and other measurement problems. To make matters worse, the
tongue is supported entirely by soft tissue, so no fixed co-ordinate
system is available. Statistical methods to deal with these problems are
presented.
The goal of the research is to associate tongue shapes and sound
production. Principal component analysis is used to reduce contours.
Combinations of two basic shapes accurately represent tongue contours.
The results are physiologically meaningful and correspond well to actual
speech activity. The methods are applied to a sample of 16 subjects, each
producing four vowel sounds. It was found that principal components clearly
distinguish vowels based on tongue contours.
We also investigate whether speakers fall into distinct groups on the
basis of their tongue contours. Cluster analysis is used to identify
possible groupings, but many variants of this technique are possible
and the results are sometimes conflicting. Methods to compare
multiple cluster analyses are suggested and applied to tongue contour
to assess the meaning of apparent speaker clusters.
SPEAKER: Prof. Benjamin Kedem
Statistics Program, Mathematics Department, UMCP
TITLE: A Semiparametric Approach to Time Series
Prediction
TIME AND PLACE:
Thurs., September 22, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Given m time series regression models, linear or
not, with additive noise components, it is shown how to estimate the
predictive probability distribution of all the time series conditional
on the observed and covariate data at the time of prediction. This is
done by a certain synergy argument, assuming that the distributions of
the noise components associated with the regression models are tilted
versions of a reference distribution. Point predictors are obtained
from the predictive distribution as a byproduct. An application to US
mortality rates prediction will be discussed.
A former student of our Statistics
Program, Dean Foster of the Statistics
Department at the Wharton School, University of Pennsylvania, will
be visiting the Business School on Friday
9/23/05 and giving a seminar entitled
"Learning Nash equilibria via public
calibration" from 3-4:15 pm in Van
Munching Hall Rm 1206.
You can see an abstract of the talk by clicking here.
SPEAKER: Professor Steven Martin
Department of Sociology, University of Maryland College Park
TITLE: Reassessing delayed and forgone marriage in
the United States
TIME AND PLACE:
Wed., September 28, 2005, 3:30pm
Room 1313, Math Bldg
NOTE
UNUSUAL TIME !
ABSTRACT: Do recent decreases in marriage rates mean
that more women are forgoing marriage, or that women are simply
marrying at later ages? Recently published demographic projections
from standard nuptiality models that suggest changes in marriage rates
have different implications for women of different social classes,
producing an "education crossover" in which four-year college graduate
women have become more likely to marry than other women in the US,
instead of less likely as has been the case for at least a century.
To test these findings, I develop a new projection technique that
predicts the proportion of women marrying by age 45 under flexible
assumptions about trends in age-specific marriage rates and effects of
unmeasured heterogeneity. Results from the 1996 and 2001 Surveys of
Income and Program Participation suggest that the "crossover" in
marriage by educational attainment is either not happening or is
taking much longer than predicted. Also, recent trends are broadly
consistent with an ongoing slow decline in proportions of women ever
marrying, although that decline is less pronounced in the last decade
than in previous decades.
SPEAKER: Professor Rick Valliant
Joint Program in Survey Methodology, Univ. of Michigan &
UMCP
TITLE: Balanced Sampling with Applications to Accounting
Populations
TIME AND PLACE:
Thurs., October 6, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Weighted balanced sampling is a way of restricting the configure of
sample units that can be selected from a finite population. This
method can be extremely efficient under certain types of structural
models that are reasonable in some accounting problems. We review
theoretical results that support weighted balancing, compare different
methods of selecting weighted balanced samples, and give some
practical examples. Where appropriate, balancing can meet precision
goals with small samples and can be robust to some types of model
misspecification. The variance that can be achieved is closely
related to the Godambe-Joshi lower bound from design-based theory.
One of the methods of selecting these samples is restricted
randomization in which "off-balance" samples are rejected if selected.
Another is deep stratification in which strata are formed based on a
function of a single auxiliary and one or two units are selected with
equal probability from each stratum. For both methods, inclusion
probabilities can be computed and design-based inference done if
desired.
Simulation results will be presented to compare results from balanced
samples with ones selected in more traditional ways.
SPEAKER: Professor Wolfgang Jank
Department of Decision & Information Technologies
The Robert H. Smith School of Business, UMCP
TITLE: Stochastic Variants of EM:
Monte Carlo, Quasi-Monte Carlo, and More
TIME AND PLACE:
Thurs., October 20, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We review recent advances in stochastic implementations of the EM
algorithm. We review the Ascent-based Monte Carlo EM algorithm, a new
automated version of Monte Carlo EM based on EM's likelihood ascent
property. We discuss more efficient implementations via quasi-Monte
Carlo sampling. We also re-visit a new implementation of the old
stochastic approximation version for EM. We illustrate some of the
methods on a geostatistical model of online purchases.
The slides for Professor Jank's presentation are linked
here .
SPEAKER: Professor Ciprian Crainiceanu
Johns Hopkins Biostatistics Department, School of Public Health
TITLE: Structured Estimation under Adjustment
Uncertainty
TIME AND PLACE:
Thurs., October 27, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Population health research is increasingly focused on identifying
small risks by use of large databases containing millions of
observations and hundreds or thousands of covariates. As a result,
there is an increasing need to develop statistical methods to
estimate these risks and properly account for all their sources of
uncertainty. An example is the estimation of the health effects
associated with short-term exposure to air pollution, where the
goal is to estimate the association between daily changes in
ambient levels of air pollution and daily changes in the number of
deaths or hospital admissions accounting for many confounders,
such as other pollutants, weather, seasonality, and influenza
epidemics.
Regression models are commonly used to estimate the effect of an
exposure on an outcome, while controlling for confounders. The
selection of confounders and of their functional form generally
affects the exposure effect estimate. In practice, there is often
substantial uncertainty about this selection, which we define here
as ``adjustment uncertainty".
In this paper, we propose a general statistical framework to
account for adjustment uncertainty in risk estimation called
``Structured Estimation under Adjustment Uncertainty (STEADy)". We
consider the situation in which a rich set of potential
confounders is available and there exists a model such that every
model nesting it provides the correctly adjusted exposure effect
estimate. Our approach is based on a structured search of the
model space that sequentially identifies among all the potential
confounders the ones that are good predictors of the exposure and
of the outcome, respectively.
Through theoretical results and simulation studies, we compare
``adjustment uncertainty" implemented with STEADy versus ``model
uncertainty" implemented with Bayesian Model Averaging (BMA) for
exposure effect estimation. We found that BMA, by averaging
parameter estimates adjusted by different sets of confounders,
estimates a quantity that is not the scientific focus of the
investigation and can over or underestimate statistical
variability. Another potential limitation of BMA in this context
is the strong dependence of posterior model probabilities on prior
distributions. We show that using the BIC approximation of
posterior model probabilities favors models more parsimonious than
the true model, and that BIC is not consistent under assumptions
relevant for moderate size signals.
Finally we apply our methods to time series data on air pollution
and health to estimate health risks accounting for adjustment
uncertainty. We also compare our results with a BMA analysis of
the same data set. The open source R package STEADy
implementing this methodology for Generalized Linear Models
(GLMs) will be available at the R
website.
You can see the paper on which this talk is based, here .
No Seminar Thursday 11/3. But NOTE
special seminar at unusual time on Monday 11/7, below.
SPEAKER: Professor Lise Getoor
Department of Computer Science, UMCP
TITLE: Learning Statistical Models from
Relational Data
TIME AND PLACE:
Mon., November 7, 2005, 4-5pm
Room 1313, Math Bldg
NOTE
UNUSUAL TIME !
ABSTRACT:
A large portion of real-world data is stored in commercial relational
database systems. In contrast, most statistical learning methods work
only with "flat" data representations. Thus, to apply these methods, we
are forced to convert the data into a flat form, thereby losing much of
the relational structure present in the data and potentially introducing
statistical skew. These drawbacks severely limit the ability of current
methods to mine relational databases.
In this talk I will review recent work on probabilistic models,
including Bayesian networks (BNs) and Markov Networks (MNs) and their
relational counterpoints, Probabilistic Relational Models (PRMs) and
Relational Markov Networks (RMNs). I'll briefly describe the
development of techniques for automatically inducing PRMs directly
from structured data stored in a relational or object-oriented
database. These algorithms provide the necessary tools to discover
patterns in structured data, and provide new techniques for mining
relational data. As we go along, I'll present experimental results in
several domains, including a biological domain describing tuberculosis
epidemiology, a database of scientific paper author and citation
information, and Web data.
Power-point slides for an extended tutorial
related to Professor Getoor's talk can be found here
. Additional related research can be found at her home-page.
SPEAKER: Professor Victor de Oliveira
Department of Mathematical Sciences, University of Arkansas
TITLE: Bayesian Analysis of Spatial Data:
Some Theoretical Issues and Applications in the Earth
Sciences
TIME AND PLACE:
Thurs., November 10, 2005, 4:00pm
Room 3206, Math Bldg
NOTE change to unusual 4-5pm
time-slot and unusual location!!
ABSTRACT: Random fields are useful mathematical tools for
modeling spatially varying phenomena. This talk will focus on
Bayesian analysis of geostatistical data based on Gaussian random
fields (or models derived from these), which have been extensively
used for the modeling and analysis of spatial data in most earth
sciences, and are usually the default model (possibly after a
transformation of the data).
The Bayesian approach for the analysis of spatial data has seen
in recent years an upsurge in interest and popularity, mainly
due to the fact that it is particularly well suited for
inferential problems that involve prediction.
Yet, implementation of the Bayesian approach faces several
methodological and computational challenges, most notably:
(1) The likelihood behavior of covariance parameters is not
well understood, with the possibility for ill behaviors.
In addition, there is a lack of automatic or default prior
distributions for the parameters these models, such as
Jeffreys and reference priors.
(2) There are substantial computational difficulties for the
implementation of Markov chain Monte Carlo methods required
for carrying out Bayesian inference and prediction based on
moderate or large spatial datasets.
This talk presents recent advances in the formulation of
default prior distributions as well as some properties,
Bayesian and frequentist, of inferences based on these priors.
We illustrate some of the issues and problems involved using
simulated data, and apply the methods for the solution of
several inferential problems based on two spatial datasets:
one dealing with pollution by nitrogen in the Chesapeake bay,
and the other dealing with depths of a geologic horizon based
on censored data.
If time permits, a new computational algorithm is described that
can substantially reduce the computational burden mentioned in (2).
Finally, we describe some challenges and open problems whose
solution would make the Bayesian approach more appealing.
NO STATISTICS SEMINAR Thursday, November
17, 2005.
BUT NOTE THAT ON FRIDAY, NOVEMBER 18, 2005, THERE IS A PAIR OF TALKS
in the Distinguished Lecture Series at the University of Maryland
co-sponsored by the Joint Program in Survey Methodology and the
University of Maryland Statistics Consortium.
The first talk is by Alastair Scott, titled "The
Design and Analysis of Retrospective Health Surveys." The second,
titled "The Interplay Between Sample Survey Theory and Practice: An
Appraisal," is by J. N. K. Rao. Click
here for additional details about the speakers and
talks.
Dr. Scott's talk will begin
at 1:00 pm and will be discussed by Barry Graubard from the
National Cancer Institute and Graham Kalton from Westat and
JPSM.
Dr. Rao's talk will begin at 3:00 pm and will be
discussed by Phil Kott from the National
Agricultural Statistical Service and Mike Brick from Westat and
JPSM.
Both talks will be held in 2205 LeFrak Hall.
There will be a reception immediately afterwards at 4:45.
SPEAKER: Professor Michael Cummings
Center for Bioinformatics and Computational Biology, UMCP
TITLE: Analysis of Genotype-Phenotype
Relationships: Machine Learning/Statistical Methods
TIME AND PLACE:
Thurs., December 8, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Understanding the relationship of genotype to
phenotype is a fundamental problem in modern genetics research.
However, significant analytical challenges exist in the study of
genotype-phenotype relationships. These challenges include genotype
data in the form of unordered categorical values (e.g., nucleotides,
amino acids, SNPs), numerous levels of variables, mixture of variable
types (categorical and numerical), and potential for non-additive
interactions between variables (epistasis). These challenges can be
dealt with through use of machine learning/statistical approaches such
as tree-based statistical models and random forests. These methods
recursively partition a data set in two (binary split) based on values
of a single predictor variable to best achieve homogeneous subsets of
a categorical response variable (classification) or to best separate
low and high values of a continuous response variable (regression).
These methods are very well suited for the analysis of
genotype-phenotype relationships and have been shown to provide
outstanding results. Examples to be presented include identifying
amino acids important in spectral tuning in color vision and
nucleotide sequence changes important in some growth characteristics
in maize.
SPEAKER: Dr. Myron Katzoff
National Center for Health Statistics/ Centers for Disease Control
TITLE: Statistical Methods for Decontamination
Sampling
TIME AND PLACE:
Thurs., December 15, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will be about an adaptive sampling
procedure applicable to microparticle removal and a methodology for
validating a computational fluid dynamics (CFD) model which it is
believed will be useful in refining such a procedure. The adaptive
sampling procedure has many features in common with current field
practices; its importance is that it would enable valid statistical
inferences. The methodology for CFD model validation which is
described employs statistical techniques used in the frequency domain
analysis of spatio-temporal data. Seminar attendees will be encouraged
to contribute their thoughts on alternative proposals for analyses of
experimental data for CFD model validation.
Slides from the talk can be viewed here .
SPRING 2006 SEMINAR TALKS:
SPEAKER: Dr. Mokshay Madiman
Statistics Department, Yale
TITLE: Statistical Data Compression with
Distortion
TIME AND PLACE:
Tues., January 31, 2006, 3:30pm Note unusual day !
Room 1313, Math Bldg
ABSTRACT: Motivated by the powerful and fruitful connection
between information- theoretic ideas and statistical model selection,
we consider the problem of "lossy" data compression ("lossy" meaning
that a certain amount of distortion is allowed in the decompressed
data) as a statistical problem. After recalling the classical
information-theoretic development of Rissanen's celebrated Minimum
Description Length (MDL) principle for model selection, we introduce
and develop a new theoretical framework for _code selection_ in data
compression. First we describe a precise correspondence between
compression algorithms (or codes) and probability distributions, and
use it to interpret arbitrary families of codes as statistical
models. We then introduce "lossy" versions of several familiar
statistical notions (such as maximum likelihood estimation and MDL
model selection criteria), and we propose new principles for building
good codes. In particular, we show that in particular cases, our
"lossy MDL estimator'" has the following optimality property: Not only
it converges to the best available code (as the amount of data grows),
but it also identifies the right class of codes in finite time with
probability one.
[Joint work with Ioannis Kontoyiannis and Matthew Harrison.]
This talk is by Invitation of the
Hiring Committee.
SPEAKER: Lang Withers
MITRE Signal Processing Center
TITLE: The Bernoulli-trials
Distribution and Wavelet
This talk is jointly sponsored with the Harmonic Analysis Seminar
this week.
TIME AND PLACE:
Thurs., February 2, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk is about a probability distribution
function for Bernoulli ("coin-toss") sequences. We use the Haar
wavelet to analyze it, and find that this function just maps binary
numbers in [0,1] into general p-binary numbers in [0,1]. Next we see
that this function obeys a two-scale dilation equation and use it to
construct a family of wavelets. This family contains the Haar wavelet
and the piecewise-linear wavelet as special cases. What is striking
here is how naturally probability and wavelets interact: the Haar
wavelet sheds light on the meaning of a distribution; the distribution
happens to obey a two-scale dilation equation and lets us make it into
a wavelet.
We take up the more general case of the distribution function for
multi-valued Bernoulli trials. A special case of this for three-valued
trials is the Cantor function. Again we find that it just maps ternary
numbers into generalized ternary numbers. I hope to develop the Cantor
wavelet as well in time for the talk.
Audience: advanced undergrad and up; some familiarity with wavelets and
measure theory is helpful.
Click here to
see a current draft of the speaker's paper on the subject of the talk.
SPEAKER: Hyejin Shin
Department of Statistics, Texas A&M University
TITLE: An RKHS Formulation of Discrimination
and Classification for Stochastic Processes
TIME AND PLACE:
Thurs., February 9, 2006, 12:30-1:45pm
Room 3206, Math Bldg
Note unusual time and place for this
seminar !
ABSTRACT: Modern data collection methods are now
frequently returning observations that should be viewed as the result
of digitized recording or sampling from stochastic processes rather
than vectors of finite length. In spite of great demands, only a few
classification methodologies for such data have been suggested and
supporting theory is quite limited. Our focus is on discrimination and
classification in the infinite dimensional setting. The methodology
and theory we develop are based on the abstract canonical correlation
concept in Eubank and Hsing (2005) and motivated by the fact that
Fisher's discriminant analysis method is intimately tied to canonical
correlation analysis. Specially, we have developed a theoretical
framework for discrimination and classification of sample paths from
stochastic processes through use of the Lo`eve-Parzen isometric
mapping that connects a second order process to the reproducing kernel
Hilbert space generated by its covariance kernel. This approach
provides a seamless transition between finite and infinite dimensional
settings and lends itself well to computation via smoothing and
regularization.
This talk is by Invitation of the
Mathematics Department Hiring Committee.
SPEAKER: Professor Jae-Kwang Kim
Dept. of Applied Statistics, Yonsei University, Korea
TITLE: Regression fractional hot deck imputation
TIME AND PLACE:
Thurs., February 16, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Imputation using a regression model is
a method to preserve the correlation
among variables and to provide imputed point estimators.
We discuss the implementation of regression imputation using fractional
imputation. By a suitable choice of fractional weights, the fractional
regression imputation can take the form of hot deck fractional imputation,
thus no artificial values are constructed after the imputation. A variance
estimator, which extends the method of Kim and Fuller (2004, Biometrika),
is also proposed. By a suitable choice of imputation cells, the proposed
estimators can be made robust against the failure of the assumed regression
imputation model. Comparisons based on simulations are presented.
Professor Kim has made the slides for his talk available here .
SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department
TITLE: Model selection and inference in regression
when the number of explanatory variables is of the same order as
sample size.
TIME AND PLACE:
Thurs., February 23, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Some of the most challenging problems in
modern econometrics and statistics feature a large number of possibly
important factors or variables, and a comparatively small sample
size. Examples include portfolio selection, detection of fraudulent
customers of credit card or telephone companies, micro-array analysis,
or proteomics.
I consider one problem of that kind: Regression with random design,
where the number of explanatory variables is of the same order as
sample size. The focus is on selecting a model with small predictive
risk.
Traditional model selection procedures, including AIC, BIC, FPE or MDL,
perform poorly in this setting. The models selected by these procedures
can by anything from mildly suboptimal to completely unreasonable,
depending on unknown parameters. In addition, inference procedures
based on the selected model, like tests or confidence sets, are invalid,
irrespective of whether a good model has been chosen or not.
I propose a new approach to the model selection problem in this setting
that explicitly acknowledges the fact that the number of explanatory
variables is of the same order as sample size. This approach has
several attractive features:
1) It will select the best predictive model asymptotically, irrespective of
unknown parameters (under minimal conditions).
2) It allows for inference procedures like tests or confidence sets
based on the selected model that are asymptotically valid.
3) Simulations suggest that the asymptotics in 1 and 2 above `kick in'
pretty soon, e.g., in a problem with 1000 parameters and 1600 observations.
These results are currently work in progress.
Professor Leeb will also give a second,
more general talk for the campus statistical community which is
jointly sponsored by the Stat Program in the Math Department along
with the campus Statistics Consortium. Details for the second talk are
as follows:
SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department
TITLE: Model Selection and Inference: Facts
and Fiction
TIME AND PLACE:
Friday., February 24, 2006, 3:00pm
Lefrak Building Room 2205
ABSTRACT: Model selection has an important impact on
subsequent inference. Ignoring the model selection step leads to
invalid inference. We discuss some intricate aspects of data-driven
model selection that do not seem to have been widely appreciated in
the literature. We debunk some myths about model selection, in
particular the myth that consistent model selection has no effect on
subsequent inference asymptotically. We also discuss an
`impossibility' result regarding the estimation of the finite-sample
distribution of post-model-selection estimators.
A paper of Professor Leeb covering most of the issues in the second
talk can be found here.
This talk is jointly sponsored by the
Statistics Consortium and the Statistics Program in the Mathematics
Department. The talk will be followed by refreshments at 4:30pm.
SPEAKER: Guoxing (Greg) Soon, Ph.D.
Office of Biostatistics, CDER, Food & Drug Administration
TITLE: Statistical Applications in FDA
TIME AND PLACE:
Thurs., March 2, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will be divided into three
parts. In the beginning I will briefly describe the kind of work the
FDA statistician do, then I will discuss two topics, one is on "From
Intermediate endpoint to final endpoint: a conditional power approach
for accelerated approval and interim analysis", one is on "Computer
Intensive and Re-randomization Tests in Clinical Trials".
1. Statistical Issues in FDA
Statistics plays an important role in the FDA's decision making
process. Statistical inputs were critical for design, conduct,
analysis and interpretation of clinical trials. The statistical issues
we dealt with include, but not limited to the following:
appropriateness of randomization procedure, determination of analysis
population, blinding, potential design flaws that may lead to biases,
quality of endpoint assessment, interim analysis, information
handling, missing values, discontinuations, decision rule, analysis
methods, and interpretation. In this talk I will describe the type of
work we do with a few examples.
2. From Intermediate endpoint to final endpoint: a conditional power
approach for accelerated approval and interim analysis
For chronic and life threatening diseases, the clinical trials
required for final FDA approval may take a long time. It is therefore
sometimes necessary to approve the drug temporarily (accelerated
approval) based on early surrogate endpoints. Traditionally such
approvals were based on similar requirements on the surrogate
endpoints as if it is final endpoint, regardless of the quality of the
surrogacy. However, in this case the longer term information on some
patients is ignored, and the risk for the eventual failure on the
final approval is not being considered.
In contrast, in typical group sequential trials, only information on
the final endpoint on a fraction of patients are used, and short-term
endpoints on other patients are being ignored. This reduces the
efficiency of inferences and will also fail to account for potential
shift of population over the course of the trial.
In this talk I will propose an approach that utilizes both short-term
surrogate and long-term final endpoint at interim or intermediate
analyses, and the decision for terminating trial early, or granting
temporary approval, will be based on the likelihood of seeing a
successful trial were the trial to be completed. Issues on Type I
error control as well as efficiency of the procedure will be
discussed.
3. Computer Intensive and Re-randomization Tests in Clinical
Trials
Quite often clinicians are concerned about balancing important
covariates at baseline. Allocation methods designed to achieve
deliberate balance on baseline covariates, commonly called dynamic
allocation or minimization, were used for this purpose. This
non-standard allocation poses challenge for the common statistical
analysis. In this talk I will examine robustness of level and power of
common tests with deliberately balanced assignments when assumed
distribution of responses is not correct.
There are two methods of testing with such allocations: computer
intensive and model based. I will review some of the common mistaken
attitudes about the goals of randomization. And I will discuss some
simulations that attempt to explore the operating characteristics of
re-randomization and model based analyses when model assumptions are
violated.
Click here
to see the slides for Dr. Soon's talk.
SPEAKER: Professor Lee K. Jones
Department of Mathematical Sciences, University of
Massachusetts Lowell
TITLE: On local minimax estimation with some
consequences for ridge regression,
tree learning and reproducing kernel methods
This talk is jointly sponsored with the Harmonic Analysis Seminar
this week.
TIME AND PLACE:
Thurs., March 9, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Local learning is the process of determining the value of an unknown
function at only one fixed query point based on information about the
values of the function at other points. We propose an optimal
methodology ( local minimax estimation) for local learning of
functions with band-limited ranges which differs from (and is
demonstrated in many interesting cases to be superior to) several
popular local and global learning methods. In this theory the
objective is to minimize the (maximum) prediction error at the query
point only - rather than minimize some average performance over the
entire domain of the function. Since different compute-intensive
procedures are required for each different query, local learning
algorithms have only recently become feasible due to the advances in
computer availability, capability and parallelizability of the last
two decades.
In this talk we first apply local minimax estimation to linear
functions. A rotationally invariant approach yields ridge regression,
the ridge parameter and optimal finite sample error bounds. A scale
invariant approach similarly yields best error bounds but is
fundamentally different from either ridge or lasso regression. The
error bounds are given in a general form which is valid for
approximately linear target functions.
Using these bounds an optimal local aggregate estimator is derived
from the trees in a Breiman (random) forest or a deterministic
forest. Finding the estimator requires the solution to a challenging
large dimensional non-differentiable convex optimization problem.
Some approximate solutions to the forest optimization are given for
classification using micro-array data.
Finally the theory is applied to reproducing kernel Hilbert space
and an improved Tikhonov estimator for probability of correct
classification is presented along with a proposal for local
determination of optimal kernel shape without cross validation.
To see a copy of the paper on which the talk is based, click
here .
SPEAKER: Professor Reza Modarres
George Washington University, Department of Statistics
TITLE: Upper Level Set Scan Statistic for
Detection of Disease and Crime Hotspots
TIME AND PLACE:
Thurs., March 16, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The upper level set (ULS) scan statistic, its theory, implementation,
and extens ion to the bivariate data are discussed. The ULS-Hotspot
algorithm that obtains the response rates, maintains a list of
connected components at each level of th e rate function and yields
the ULS tree is described. The tree is grown in the immediate
successor list, which provides a computationally efficient method for
likelihood evaluation, visualization and storage. An example shows
how the zones are formed and the likelihood function is developed for
each candidate zone. Bivariate hotspot detection is discussed,
including the bivariate binomial model, the multivariate exceedance
approach, and the bivariate Poisson distribution. The Intersection
method is recommended as it is simple to implement, using univariate
hotspot detection methods. Applications to mapping of crime hotspots
and disease clusters are presented.
Joint work with G.P. Patil.
SPEAKER: Professor Robert Mislevy
Department of Educational Measurement & Statistics (EDMS),
UMCP
TITLE: A Bayesian perspective on structured
mixtures of IRT models: Interplay among psychology, evidentiary
arguments, probability-based reasoning
TIME AND PLACE:
Thurs., March 30, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: (Joint paper with Roy Levy, Marc Kroopnick,
and Daisy Wise, all of EDMS.)
Structured mixtures of item response theory (IRT) models are used in
educational assessment for so-called cognitive diagnosis, that is,
supporting inferences ab out the knowledge, procedures, and
strategies students use to solve problems. Th ese models arise from
developments in cognitive psychology, task design, and psy chometric
models. We trace their evolution from the perspective of Bayesian
inf erence, highlighting the interplay among scientific modeling,
evidentiary argument, and probability-based reasoning about
uncertainty.
This work draws in part on the first author's contributions to the
National Research Council's (2002) monograph, available online :
Knowing what students know, J. Pellegrino, N. Chudowsky, &
R. Glaser (Eds.), Washington, D.C.: National Academy Press.
On Friday, April 7, 2006,
JPSM is sponsoring a Distinguished Lecture:
SPEAKER: Nora Cate Schaeffer
TITLE: Conversational Practices with a Purpose:
Interaction within the Standardized Interview
TIME AND PLACE:
Friday, April 7, 2006, 3:30pm
Room 2205 Lefrak Hall
There will be a reception immediately afterwards.
ABSTRACT: The lecture will discuss interactions in survey
interviews and standardization as it is actually pacticed. An early
view of the survey interview characterized it as a "conversation with
a purpose," and this view was later echoed in the description of
survey interviews as "conversations at random." In contrast to these
informal characterizations of the survey interview, stand the formal
rules and constraints of standardization as they have developed over
several decades. Someplace in between a "conversation with a purpose"
and a perfectly implemented standardized interview are the actual
practices of interviewers and respondents as they go about their
tasks. Most examinations of interaction in the survey interview have
used standardization as a starting point and focused on how
successfully standardization has been implemented, for example by
examining whether interviewers read questions as worded. However, as
researchers have looked more closely at what interviewers and
respondents do, they have described how the participants import into
the survey interview conversational practices learned in other
contexts. As such observations have accumulated, they provide a
vehicle for considering how conversational practices might support or
undermine the goals of measurement within the survey interview. Our
examination of recorded interviews from the Wisconsin Longitudinal
Study provides a set of observations to use in discussing the
relationship among interactional practices, standardization, and
measurement.
SPEAKER: Prof. Jiuzhou Song
Department of Animal Sciences, UMCP
TITLE: The Systematic Analysis for Temporal
Gene Expression Analysis
TIME AND PLACE:
Thurs., April 13, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In temporal gene expression analysis, we propose a strategy to explore
the use of gene and treatment effect information, and build synthetic
genetic network. Assuming that variations of gene expression are
caused by different conditions, we classified all experimental
conditions into several subgroups via clustering analysis which groups
conditions based on the similarity of temporal gene expression
profiles, this procedure is useful because it allows us to combine
more diverse gene expression data sets as they become available, by
setting a reference gene we described makes the genetic regulatory
networks laid on a concrete biological foundation. We also visualized
the gene activation process via starting point and ending point, and
combined all of the information to describe genetic regulatory
relationships and obtain consensus gene activation order. The
estimation of activation points and building of synthetic genetic
network may result in important new insights in ongoing endeavor to
understand the complex network of gene regulations.
On Thursday, April 20, 2006,
4:15-6:45pm, there will be a Statistics Consortium
Sponsored Statistics Day event, involving a Distinguished Lecture and
a Discussion at Physics Building Room 1410.
DISTINGUISHED SPEAKER: Professor Peter
Bickel Statistics Department,
University of California, Berkeley
TITLE: Using Comparative Genomics to
Assess the Function of Noncoding Sequences
TIME AND PLACE:
Thursday, April 20, 2006, 4:15-6:00 pm
Room 1410, Physics Building
ABSTRACT: We have studied 2094 NCS of length
150-200bp from Edward Rubin's
laboratory. These sequences are conserved at high homology between
human, mouse, and fugu. Given the degree of homology with fugu, it
seems plausible that all or part of most of these sequences is
functional and, in fact, there is already some experimental validation
of this conjecture. Our goal is to construct predictors of regulation
(or potential irrelevance) by the NCS of nearby genes and further using
binding sites and the transcription factors that bind to them to deduce
some pathway information. One approach is to collect covariates such as
features of nearest genes, physical clustering indices, etc, and use
statistical methods to identify covariates, select among these for
importance, relate these to each other and use them to create stochastic
descriptions of the NCS which can be used for NCS clustering and NCS and
gene function prediction singly and jointly. Of particular importance so
far has been GO term annotation and tissue expression of downstream
genes as well as the presence of blocks of binding sites known from
TRANSFAC data base in some of the NCS. Our results so far are
consistent with those of recent papers engaged in related explorations
such as Woolfe et al (2004), Bejerano et al (2005) and others but also
suggest new conclusions of biological interest.
DISCUSSANT: Dr. Steven Salzberg
Director, Center for Bioinformatics and Computational Biology, and
Professor, Department of Computer Science, University of Maryland
The Lecture and Discussion will
be followed by a reception (6:00-6:45pm)
in the Rotunda of the Mathematics Building.
SPEAKER: Dr. Neal Jeffries
National Institute of Neurological Diseases and Stroke
TITLE: Multiple Comparisons Distortions of
Parameter Estimates
TIME AND PLACE:
Thurs., April 27, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In experiments involving many variables investigators typically use
multiple comparisons procedures to determine differences that are
unlikely to be the result of chance. However, investigators rarely
consider how the magnitude of the greatest observed effect sizes may
have been subject to bias resulting from multiple testing. These
questions of bias become important to the extent investigators focus
on the magnitude of the observed effects. As an example, such bias
can lead to problems in attempting to validate results if a biased
effect size is used to power a follow-up study. Further, such factors
may give rise to conflicting findings in comparing two independent
samples -- e.g. the variables with strongest effects in one study may
predictably appear much less so in a second study. An associated
important consequence is that confidence intervals constructed using
standard distributions may be badly biased. A bootstrap approach is
used to estimate and correct the bias in the effect sizes of those
variables showing strongest differences. This bias is not always
present; some principles showing what factors may lead to greater
bias are given and a proof of the convergence of the bootstrap
distribution is provided.
Key words: Effect size, bootstrap, multiple comparisons
SPEAKER: Professor Bing Li
Department of Statistics, Penn State University
TITLE: A Method for Sufficient
Dimension Reduction in Large-p-Small-n Regressions
TIME AND PLACE:
Thurs., May 4, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Large-p-small-n data, in which the number of recorded
variables (p) exceeds the number of independent observational units
(n), are becoming the norm in a variety of scientific fields. Sufficient
dimension reduction provides a meaningful and theoretically motivated
way to handle large-p-small-n regressions, by restricting
attention to d < n linear combinations of the original
p predictors. However, standard sufficient dimension reduction
techniques are themselves designed to work for n > p, because
they rely on the inversion of the predictor sample covariance
matrix. In this article we propose an iterative method that
eliminates the need for such inversion, using instead powers
of the covariance matrix. We illustrate our method with a genomics
application; the discrimination of human regulatory elements
from a background of ``non-functional" DNA, based on their alignment
patterns with the genomes of other mammalian species. We also
investigate the performance of the iterative method by simulation,
obtaining excellent results when n < p or $n \approx p$. We
speculate that powers of the covariance matrix may allow us to
effectively exploit available information on the predictor
structure in identifying directions relevant to the regression.
SPEAKER: Professor Biao Zhang
Mathematics Department, University of Toledo
TITLE: Semiparametric ROC Curve Analysis
under Density Ratio Models
TIME AND PLACE:
Thurs., May 11, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Receiver operating characteristic (ROC) curves are commonly used to
measure the accuracy of diagnostic tests in discriminating disease and
nondisease. In this talk, we discuss semiparametric statistical
inferences for ROC curves under a density ratio model for disease and
nondisease densities. This model has a natural connection to the
logistic regression model. We explore semiparametric inference
procedures for the area under the ROC curve (AUC), semiparametric
kernel estimation of the ROC curve and its AUC, and comparison of the
accuracy of two diagnostic tests. We demonstrate that statistical
inferences based on a semiparametric density ratio model are more
robust than a fully parametric approach and are more efficient than a
fully nonparametric approach.
FALL 2006 SEMINAR TALKS:
SPEAKER: Prof. Eric Slud
Mathematics Department, UMCP
TITLE: "General position" results on uniqueness of
optimal nonrandomized Group-sequential decision
procedures in Clinical Trials
TIME AND PLACE:
Thurs., Oct. 26, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will first give some background
on group- or batch-sequential hypothesis tests for
treatment effectiveness in two-group clinical trials.
Such tests are based on a test statistic like the
logrank, repeatedly calculated at a finite number of
"interim looks" at the developing clinical trial
survival data, where the timing of each look can in
principle depend on all previously available data. The
focus of this talk will be on a decision-theoretic
formulation of the problem of designing such trials,
when, as is true in large trials, the data can be
viewed as observations of a Brownian motion with
drift, and the drift parameter quantifies the
difference in survival distributions between the
treatment and control groups. The new results
presented in the talk concern existence and
uniqueness of nonrandomized optimal designs, subject
to constraints on type I and II error probability,
under fairly general loss functions when the cost
functions are slightly perturbed, randomly, as
functions of time. The proof techniques are related
to old results on level-crossings for continuous
time random processes.
This work is joint with Eric Leifer, a UMCP PhD of
several years ago now at the Heart, Lung and Blood
Institute at NIH.
To see a copy of the slides for the talk, click here .
Last updated April 2, 2008
|