Statistics 430 Introduction to Statistical Computing & SAS                    Spring 2009
Section 0101, MWF 11, Rm CHE 2118 (Chem Eng Bldg not Chem)

Quick access to Directories:    Logs used in Lectures          Illustrative Scripts          Printed pdf Handouts

Click here for a generic Stat 430 Course Syllabus, and here for the current course outline.

Pointer to a listing of current readings and web-page materials to work through, by week.

Pointer to new Homework Assignments and old HW solutions

Pointer to Illustrative Scripts                   Pointer to Printed Handouts

Instructor: Eric Slud, Statistics Program, Math. Dept.

Office:    Mth 2314, x5-5469, email evs@math.umd.edu

Office hours:    M 2-3, Th 11-12

This course is an introduction to statistical and graphical techniques of data analysis and their implementation in the SAS programming language/platform. The emphasis is on data analysis skills, but since one such important skill is justification of assumptions and understanding of the rationale behind analyses, the course develops ideas and explains concepts from statistical theory.
 

Prerequisite: Stat 400. The material needed is mostly definitions and concepts, and some basic algebraic manipulations involving probabilities. But later in the course, some understanding of Stat 400 material on distributions of functions of random variables will help you make sense of statistical simulation methods. (That material will be reviewed as needed.)
 

Text: Ronald Cody and Jeffrey Smith, Applied Statistics and the SAS Programming Language,
5th ed., Prentice-Hall, 2006.

Course requirements and Grading:    Most of the work for the course will consist of 8--10 graded Problem Sets. These will involve writing and running small SAS programs and interpreting the sequence of data-analysis operations and outputs.
While you will be permitted to share hints and information concerning SAS programming, the reasoning behind analyses, summaries of them, interpretation of results, and the edited copies you hand in must be exclusively your own work.

In addition, there will be an in-class test toward the end of March, on basics of the SAS language and concepts underlying data-display and statistics in categorical data, two-sample comparisons, and simple linear regression. Finally, there will be a slightly more ambitious data-analysis term project in place of a Final Exam (due Friday, May 15, 5pm).
The course grade will be based on a weighted average of your homework, test, and project grades, with 50% weight on Homework scores (with none dropped) and 25% for the Test and 25% for the Term Project.


MIDTERM TEST

The In-class Midterm test will be given on Monday, March 30, 2009.   It will cover
material from Chapters 1-3 (omitting sections 3.M, 3.P and 3.R), 5 through 5.F, 6.A-B,
13.A-D,H-K, 14.A-D. plus several handouts (the ones on Plotting, Histograms, QQplots,
Empirical Distribution Functions, and Partial Correlations.) Only material covered in
class (through Wed., Mar. 25) and handouts and scripts will be within scope
for the test.
NOTE: you can bring one 2-sided notebook sheet to the test as a memory aid.
Except for your notebook sheet, the test is closed-book. You can use a calculator,
but I will not ask for much if any arithmetic.


Click here for Data Analysis Term Project Guidelines.     (Due Date: Friday May 15, 5pm)


The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This Code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please click here.


If you need help...

My office hours are (tentatively) Monday 2-3 and Thursday 11-12.  I will often
be available at other times too, except on Tuesdays, but please send an e-mail
or arrange with me in class for an office appointment.


Homework Assignments and Solutions.

The current problem assignment can be found here. Here is an additional link for current reading
which you should do in order to follow the lectures and practice the tools for the Homework.
Solutions to that assignment and selected problem solutions (other than those included
in example Scripts) will be posted to the HWSoln Directory as the term progresses.


           HOMEWORK GUIDELINE

Please remember for all Homework and Project papers to be handed in for this course:
the consistent guideline is to hand only as much SAS code as will show that you did
the computations correctly using SAS, and only as much output, edited into a coherent
narrative where narrative and explanations are requested, as is needed to answer the
questions asked and to justify the sequence of steps and conclusions you have made.

You will be graded down for handing in lots of extraneous material !




      HANDOUTS   (with many more to come)

(1).    Click SASintro for a step-by-step discussion about how to get started in SAS on
University or other machines.
        By clicking here, you can download free `X-windows' software that will allow you
to create the X-windows needed to use SAS in your WAM account from a home PC.
NOTE: the xlivecd software works only for Windows versions up to and including XP,
not Vista.
Another approach which which works very well is to install the free software Xming:
do this by following the instructions here VERY closely.

(2).    Click Plotting for some information about how to generate high-quality plots in SAS.

(3).     Click here for a handout containing a useful list of available SAS functions (of which
the Sample Statistics, Quantile Functions, and Probability & Density functions will be
the most useful in this course.)

(4).    Click here to find a copy of the course outline and the current problem assignment.

(5).     For handouts related to material covered in class on February 6 and 9, 2009, click
Empirical Distribution Functions or Scaled Relative Frequency Histograms.

(6).    For a handout discussing the relative interpretability of  relative risks and odds
ratios in analyzing two-way frequency-table datasets, click here.

(7).    Click here for a sample test indicating coverage by topics along with some
sample questions. For a recent Sample Test, click here. For an outline of the topics
and types of questions particularly relevant for this semester (Fall 2008), click here.
A new set of Fall 2008 sample test problems can be found here.

(8).    A handout giving the theoretical formulas for confidence and prediction
intervals in simple linear (normal-errors) regression can be found here. It contains
justifications and formulas for the calculations SAS does of CLM and CLI confidence
and prediction limits.

(9).   A handout and Worksheet on Partial Correlation, including definitions
and three problems. This Worksheet contains three Problems which are
to be handed in as part of Homework Set 5.

(10). For a freely downloadable textbook on "Residuals and Influence in Regression",
by Cook and Weisberg, visit this website.



      SCRIPTS
             I have provided a series of illustrative scripts, including handouts
from class and expanded examples of working SAS programs discussed in class.
Click Scripts to find the directory of text Logs and Scripts of SAS example sessions.


DATA DIRECTORY:  Click here to find a directory of available Datasets.

Throughout the term, additional links will be posted here to various online
data sources and repositories:

  •   UCI Machine Learning Repository containing many datasets with
              challenging structure.
  •   StatLib has subdirectories "data", "jasadata", "disease", and "DASL"
              containing datasets on specialized topics, from methodological
              journal articles, etc.
  •   Examples and datasets from a web-page in a data-analysis course Biostat 510
              taught by Kathy Welch at U of Michigan can be found here (scroll down to
              BioStat 510) or in the corresponding file directory.
  •   Many sources of data and links to organizations that provide such sources can be
              found in the Resources section of the Stat Consortium web-page. One particularly
             promising page, obtained at the "StatSci.org" link, can be found here.

  •   The short Nature article by B. Rolett and J. Diamond about deforestation and
              other geographic aspects of Polynesian islands, explaining the interest of the dataset
              nature02801-s2.dat within the course Data directory, can be found here.

  • Important Dates

    Return to my home page.

    © Eric V Slud, May 20, 2009.