FINAL Stat 798C PROJECT DUE May 19, 2003 by 12 noon. ================================================ The Final Project, which will count as much as 2 Homework Sets, is to be a 5 to 10 page paper on a topic or data analysis of your choice. At least 5 of the pages should consist of narrative, mathematical exposition and notation, etc., as opposed to computer output. The computer results and pictures you hand in should be carefully chosen and edited to support the points you are making in the main text. Unlike the small data analyses and tasks you have done throughout the course, the purpose of this Project is for you to put together a coherent and reasoned train of thought, which may involve data analysis and graphical presentation, illustration and interpretation of a computational procedure, simulation, etc. After choosing your topic, you should check with me to confirm its suitability. Here are some examples of suitable project topics. EXAMPLES OF PROJECT TOPICS (I) You could do a Case Study or extended data analysis on a dataset downloaded from StatLib or other source. Many journal articles over the past 15+ years have datasets which have been posted to on-line archives such as those on StatLib. The Project would consist of a package of alternative analyses, including graphical display, residuals analysis if appropriate, some kind of goodness of fit confirmation, etc. Your chosen dataset should be of sufficient size and/or complexity to support that much effort. (II) There are various analyses, timing studies, simulations, etc. which you could present to compare Splus versus SAS or one version of Splus versus another (eg Splus 3.4 versus R). One example of this sort, discussed in class and begun in my latest installment of Lecture Notes (one the web-page under /s798c/Handouts/Lec03Pt5D.pdf) is an extended timing comparison of different ways to code simulation-loops in Splus involving large datasets and statistical analyses within each iteration. Comparisons of Splus versus R, and of different ways to break up large loops into smaller loops and functions, would be suitable and interesting. (III) You could also choose some Venables and Ripley topic not covered in class, write up a short summary of the statistical rationale for using it on certain kinds of data, code some illustrations, and show how to interpret the output list-components (and confirm --- perhaps via simulation --- that you have the interpretations right !) Three examples of such topics: --- gam (Generalized Additive Models) --- lme, nlme (Linear and nonlinear regression with random effects) --- time series, various possible methods MANY OTHER POSSIBILITIES EXIST ! TALK TO ME ABOUT THEM IF YOU WANT POINTERS.