For sample problems you can use to
practice for the in-class tests or the Final Exam,
including last year's Final and an answer key for the Sample Problems for Test 2, click here.
Answer key for Sample Final Exam Problems is now also included.
Instructor: Eric Slud, Math. Dept. Rm. 2314, X 5-5469, email@example.com
Office hours: Monday 4, Th 2, or by appointment.
Prerequisite: Math 140-141 & Stat 400.
Text: Probability & Statistics for Engineering
and the Sciences with Minitab 14, 7th ed. (2008),
by J. L. Devore, Duxbury Press.
Coverage: In the first 2 weeks, we will review Stat 400
ideas and techniques. Afterwards: we will
cover in the Devore text: Chapters 7-10, 12, 14 and parts of 11 and 13, plus some extra handouts
on statistical computing and simulation. For more detailed topics, see the Chapter tables of contents
and the official course syllabus, together with the handouts below.
Grading: The grade in the course will be based 20% on
homeworks (about 8, graded) from the book
and including some supplementary problems of mine, 10% on data-project homeworks, 40% on 2 in-class
tests, and 30% on a comprehensive final.
Computing: You will need to learn to work with some
statistical computing platform to do simple statistical
calculations on moderate to large datasets in the course, and to do data simulations. Calculator or spreadsheet
will not be enough. You may use Minitab or Matlab or R or a standard statistical package like SAS or Stata.
However, I will be providing information and help (and web-posted scripts) only with R. To find
information about which computer labs on campus have which of these kinds of software loaded, click here.
You can find information on getting started with R in the
CD that comes with the book, or by visiting the
R web-site from which you can freely download R software (very similar to Splus) including miscellaneous
packages and datasets. For an introductory tutorial in R, click here. For a quick start, see Rbasics handout,
and then consider reading more about syntax in a book, like the (early chapters of the) book of
W. Venables and Brian Ripley, "Modern Applied Statistics With S" (Springer, currently 4th ed.).
As indicated in the "R_Manual" section
of the Devore text's accompanying CD, you can get a special
R package containing all of the book's datasets, from a network of web sites called CRAN that
contain R add-on packages. You do this by the command
To load the files within an R session you type:
HW1, due Fri., Feb. 4, .
Reading: Read and Review Chapters 5 and 6 in DeVore. Also read the Handout (1) below on Simulation.
#1 Suppose that the independent random variables
for i=1,..,100 have density f(x) = 2x
for 0 < x < 1.
(a) Find the approximate probabilities P( 45+10j < S < 55+10j) for j=1,2,3,4, where S = X1+...+X100.
(b) Find the expectation and variance of the number of indices i for which Xi > 0.6.
#2 Read the Simulation of Random Variables Handout and do Problem Sim.3 on page 4 of that handout.
#3 Suppose that U1,...,U40 are
Uniform[0,θ] random variables, observed as data.
(a). Show that the scaled average S1 = (U1+...+U40)/20 is an unbiased estmator of θ.
(b). Show that for some constant c, c*S2 is an unbiased estimator of θ , where S2 = max(U1,...,U40).
Hint: Check that P(S2 < x) = (x/&theta)40 for 0 < x < θ .
#4 Find the standard error of the two estimators
S1 and c*S2 appearing
parts (a) and (b) of Problem #3.
#5 Suppose that Y1,...,Y1000
are independent identically distributed observations with density
f(y) = 1/3
for 0 < y < 1 and f(y) = 2/3 for 1 < y < 2, and for k=1,2,3,4 let Nk =
(# of indices i in 1..1000 with (k-1)/2 < Yi < k/2). Find the means and variances of each of the relative
frequencies Nk/1000, for k=1,2,3,4.
HW2, due Wed., Feb. 18.
Complete your review of Chapter 6 (ML Estimators), and read Sections 7.1 through 7.3 in Ch.7 of DeVore.
Then solve and hand in the following problems:
#1, 2 Problems 20 and 28 in Sec. 6.2, p.251.
#3 (Do #11 on p.263 for practice and look at its solution in
the Solutions manual. Then do and hand in the following problem.)
Suppose that you learn a new method of generating 90% two-sided confidence intervals (L(X), U(X)) for the unknown mean
μ for samples X1, ..., Xn of data in which the individual values Xi are approximately normally distributed, where the sample
size n is between 35 and 50. Suppose also that you have a method of simulating independent samples X(r) = X1,r, ..., X42,r
for r=1,...,2000, , on each of which you can calculate the confidence interval Ir = (L(X(r)), U(X(r))) .
(For these simulated intervals, you will know the mean parameter μ0 .)
(a) What is the the approximate number of these confidence intervals I1, ..., I2000 that you expect to contain the true mean μ0 ?
(b) What kind of random variable is the number N of samples r=1,..,2000 for which μ0 falls outside Ir ?
(c) What is the approximate probability that N in (b) is between 185 and 220 , inclusive of endpoints ?
(d) Approximately how likely is it that of the first 20 of these samples X(r) and intervals Ir , r=1,..,20, all contain μ0 ?
#4, 5 Problems 8 and 10 in Sec. 7.1, p.262.
#6, 7 Problems 18 and 20 in Sec.7.2, p.269.
#8 You can find by clicking here a
dataset consisting of the logarithms of the average annual rainfall in
inches from 70 US and
Puerto Rico cities (data from the 1975 Statistical Abstracts of the United States). (a). Compute a few scaled relative frequency
histograms of these data (with different numbers L of class intervals), and hand in the one that you think best shows the shape of
the underlying density. Overlay on the same histogram plot (by hand if necessary) a graph of the normal density curve with the
same mean and variance as the sample mean and variance of your data. Use this plot and histogram to comment briefly on whether
you think the assumption of normal distribution for these data is tenable.
(b). Give a 95% two-sided confidence interval for the mean of these data, using an assumed-known value 0.25 for the
variance and an assumption of normality for the individual data points.
(c). Give a 95% two-sided confidence interval for the mean of these data, assuming normality, if the variance is unknown.
(d). Re-do parts (b) and (c), giving approximate large-sample confidence intervals dispensing with the assumption of
normality for the data.
HW3, due Wednesday, March 2. Read the rest of Chapter 7,
and the first 2 sections of Ch.8 of DeVore.
Solve and hand in the following eight problems:
#1 Use R or other statistical software to simulate
100 samples of size 40 of Gamma(1.3,2.6) data-values
(i.e., random variables with density f(x) = (2.6)2 x0.3 e-2.6x for x > 0, for which the mean is μ = 1.3/2.6 = 0.5).
(a). For each sample (i.e. each row of a rectangular 100 x 40 array), calculate Xbar, and use it to
define a large-sample 95% CI for μ .
(b). Plot in some form (or print out) the confidence intervals calculated on your 100 samples of size 40,
indicating whether each CI contains the true value 0.5. (Each CI is a function only of Xbar for that sample.)
(c). How many times did your CI fail to capture the true value μ0 ? What is the expected number of
times (out of 100) for this to happen ? Should you have been suprised if this happened as few as 1 time ?
if it happened 9 or more times ?
#2 Do #27, p.270 for practice and then hand in the following:
Give some numerical computations (in R or
other computing platform) showing what the 95% confidence intervals would be (for some specific
examples of values X1+...+Xn = k) and what their actual coverage probabilities would be
according to exact Binomial(n,p) probability distributions for the values
(n=78, p=.57), (n=47,p=.53), (n=46,p=.16) according to confidence intervals (7.11), (7.10), and the
one given in problem 27. See the Rscript/Coverage.RLog script for the necessary R coding.
ALSO: for each of these (n,p) parameter combinations, give at least one nearby value of n (for same p) for
which the ordering of "best performance" among the three intervals is altered.
#3--#4 Do #22, #26, p.269.
#5 Do #38, p.278.
#6 Do #44, p.280.
#7 Do #52, p.281.
#8 Do #12, p.294.
HW4, due Wednesday, March 16. Read the rest of Chapter 8,
and sections 9.1, 9.2 and 9.4
of Ch.9 of DeVore. Solve and hand in the following eight problems:
Do #10, pp.293-4, #20, p.304, #32, p.306, #42, p.311, #52 and #54, p.317-8.
Do #2, p.334, three ways: using a large-sample Z-approximation
as covered in Sec.9.1; using a pooled
t-test as in Sec. 9.2; and with the Satterthwaite-Welch approximation as on pp.336-337.
Do #28, p.342.
HW5, due Monday, April 11. Read Section 9.5. Read
Chapter 14 through Section 14.2. Then read
Section 4.6 plus the handout on Empirical Distribution Functions.
#1: Problem on power and p-value: suppose that you
see data values X1,...,X31 which can
to be iid normally distributed, with Xbar = 24.0 and S = 8.0. Suppose that these date were collected
to test the hypothesis H0: μ ≤ 22.7 versus HA: μ > 22.7.
(a). Give the p-value for the test in which you treat this as a large-sample test (or equivalently,
where you take σ0=8.0 as known).
(b). Find the power of the size .05 test versus μ1 = 25, again treating the test as a large-sample test.
(c). Re-do part (a), this time treating the test as a small-n one-sample t test. This part
of the problem requires you to use a calculator or PC to calculate the p-value using a
t-distribution probability distribution function program in place of a table.
#2-3: Ch.9, #62, 64, pp.363-364.
#4: Ch.9, #68. Do a preliminary test for equality of
variances before you decide which two-sample
t interval to use for the mean difference.
#5-#6: Ch.14, #6, 9, p.575.
#7: Ch. 9, #72, p.365.
#8: Ch.4, #94, p.179. Use R to create
probability plots using qqnorm or
R scripts will be provided,
HW6, due Wednesday, April 20. In Devore, finish
reading Sec. 14.2, and read Ch. 10,
Sections 1 and 2. Solve and hand in the following six problems:
Ch.14, #8, p.575.
Ch. 4, #92(a), pp.178-179.
Ch.10, # 2, 8. pp.378-379; and # 12, 16, pp.384-385.
HW7, due Wednesday, May 4. Read Sections 12.1 to 12.3,
and solve and hand in
the following problems:
Ch.10, p. 385: #18, 20.
Ch.12, p.453: #6; #12, p.465; #20, p.466; #34, 36, p.476.
GENERAL GUIDELINES ON HOMEWORK.
1. Academic Dishonesty. You may ask questions of each other
and of me to get hints on how to solve the
various assigned homework problems. However, you may not share computations and written work: you
must each do that work and write it up individually. Homework papers which have identically copied
segments will be regarded as a violation of the campus honor code.
2. Late Homework and Test Make-Ups. The course policy on
late homeworks is that they will be accepted
but graded down, by 10 percent if past due by no more than one class session and by 25 percent if later than
that. These penalties will be waived only for medical excuses or valid University-recognized holidays.
Regarding test make-ups, we will adhere to campus policy.
Sample Problems for Tests and Exams
(1) To practice for Test 1, a series of 10 relevant
applied/computational problems drawn (selected and
edited) from the DeVore "Testbank" (on the CD-ROM coming with the book's 7th edition) can be found here.
(You may have to zoom in with your browser or MS Word reader to read some of the technical formula
elements in this document.)
(2) A sheet of additional problems relevant to Test 1 can be
found here. These are modified from similar
problems that I have given in the past, which would call for a little more theoretical interpretation than
the mechanical `applied' problems coming from the TestBank problem-sheet in (1).
(3) Practice problems for In-Class Test 2 (Wed., Dec. 1) can be found
here, along with an answer key.
For a list of topics and problem types, click here.
(4). To see last Fall's Stat 401 Final Exam, from another
instructor, click here. In this
Exam, the MLE in
Problem 1 is not a topic that we spent much time on, but we did spend some time on it and you should be
able to do it. The other problems are well in the mainstream of what we covered this semester.
Answer key is included here.
(5). Try the sample-exam from
1995 which I have adapted to conform more closely to what we studied
this term. An answer key is included here. (Scroll down a bit in the document to find these Sample Exam answers.)
Handouts (some from Stat 400, and some from John Millson):
(1) 10/20/03 There
are two handouts here, respectively on
Variables and on Random Number Generation and Simulation . These topics are very
important for the rest of the course, as they allow us to generate and interpret `artificial data'
to illustrate the meaning of our Probability Limit Theorems (Law of Large Numbers, Central
Limit Theorem) and later statistical results (Consistent Statistical estimators, Confidence
Intervals). In addition, Simulation gives us an `experimental' avenue to calculate via artificial
data probabilities which may be too difficult to figure analytically.
(2) As of 8/23/10 See John Millson's Stat 401 page for handouts on diverse topics related to the course.
The handout on Normal
Approximation to Binomial Distribution contains
word-problem worked example, as well as some numerical examples of the quality of the
normal approximation to the Binomial. This example is continued below, in a statistical
setting (confidence interval for estimate of a population proportion in a political opinion poll)
in handout (7) below, dated 11/19/03.
A graph comparing the distribution function values of Binom(100,.3) with its
approximating normal distribution N(30,21) can be found here.
9/29/03 This handout concerns numerical calculations for the Binomial
Hypergeometric random variables, and the Poisson approximation to the Binomial. In addition,
some simulated-data results are given to show that the expectations and probability mass functions
behave as they should according to the relative-frequency interpretation of probabilities.
(5) 10/27/03 Example of Simulation for Calculating Probability and Expectation.
Picture showing the behavior
of sample averages Sn/n as a function of
1,...,2000 on each of four sets of simulated data, from different types of random variables.
Within each picture, the sample averages Sn/n are based on progressively larger segments of the
same 2000 data-values, and the point is to see that these averages settle down to the place where
the Law of Large numbers guarantee they should for large enough n, namely the theoretical
expectation of the individual r.v.'s.
Pictures showing behavior of
scaled relative frequency histograms compared with densities.
The document shows plots of histograms in large simulated samples overlaid with the theoretical densities
they are supposed to represent, and of empirical distribution functions overlaid with the theoretical cdf's
the data in large simulated datsets are supposed to represent. The latter are available in two settings:
(i) The overlaid empirical and theoretical cdf's for 1000 simulated values of Z1+Z2 (sum of two
independent standard normal deviates) can be found here .
(ii) The overlaid empirical and theoretical cdf's for 1000 simulated values of U1+...+U100
(sum of 100 independent Uniform[0,1] independent deviates) can be found here.
(8) 11/19/03 The
word-problem on political opinion polling begun in handout (3) above,
dated 10/22/03, is continued here from the vantage point of statistics, particularly
confidence intervals for estimates of a population proportion in a political opinion poll.