LOG FOR IN-CLASS DISCUSSION/DEMONSTRATION of GRAPHICS AND REJECTION-SAMPLING 2/21/03 ===================================================== GRAPHICS: First consider the case of a data-frame which we called "classfram" giving the names, HWgrade, Final (exam grade), and course letter-grade for 25 members of an undergraduate class. The data-frame, generated as follows: > HWgrade <- rbeta(25,3,1)*100 > Final <- pmax(0, pmin(rnorm(25, HWgrade + 10, 10),100)) > Glab <- c("A","B","C","D","F") ### NOTE: pmin and pmax are the versions of min and max ### that respectively apply componentwise to pairs of vectors > classfram <- data.frame(names=paste("Student",1:25,sep=""), HWgrade = round(HWgrade,1), Final = round(Final,1), Grade= Glab[pmin(floor((110-0.5*(HWgrade+Final))/10),5)]) ### We discussed the steps needed to plot a graph containing ### a scatterplot of Final vs HW grades, with plotting ### character equal to letter-grade. Here they are ### for reference: try them yourself ! > motif() plot(classfram$HWgrade, classfram$Final, xlab="HW avg", ylab="Final Exam grade", type="n", main= "Final Exam vs HW score, with Letter-grade Plotted") > for(i in 1:5) { inds <- classfram$Grade == Glab[i] points(classfram$HWgrade[inds], classfram$Final[inds], pch=Glab[i]) } legend(locator(), legend=c(">=90", "80:89", "70:79", "60:69", "<60"), pch=paste(Glab, collapse="")) =========================================================== ACCEPTANCE-REJECTION SAMPLING ### Illustration of sampling of Normal rv generation ### from Cauchy (standard, taken from Robert & Casella): > xv <- -3 + (0:600)/6 > cc <- max(dnorm(xv)/dcauchy(xv)) [1] 1.520347 > (2/exp(0.5))*sqrt(pi/2) ### max attained at +1, -1 [1] 1.520347 ### So far, we have checked that the max of the #### ratio of densities is bounded by cc ### Now generate Cauchy variates via simply-implemented ### probability integral transform: > ccvec <- tan(pi*(runif(10000)-0.5)); uvec <- runif(10000) > rvec <- dnorm(ccvec)/(cc*dcauchy(ccvec)) > summary(rvec) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0.2762 0.8419 0.6536 0.9343 1 ### This is the vector of values we use in the accept-reject ### algorithm. The number of vaariates needed to obtain one ### newly simulated `normal' deviate is the number of ### successive (rvec[i],uvec[i]) pairs until the first time ### rvec[i] >= uvec[i]: the expected waiting time is in theory ### 1/P(rvec[1] >= uvec[1]) = 1/(average value of rvec) = ### 1/.66 = 1.52, and we check this below > AccRej <- function(xv, rv, uv, ctr) { ### ctr is 1 less than index of first random number ### to use assumed less than length-10 flag <- T while(flag) { ctr <- ctr+1 flag <- (uv[ctr] > rv[ctr]) } c(rn = xv[ctr], ctr = ctr) } > AccRej(ccvec, rvec, uvec, 0) -0.2727303 1 > AccRej(ccvec, rvec, uvec, 1) -1.467927 2 > AccRej(ccvec, rvec, uvec, 2) 0.7383699 3 > AccRej(ccvec, rvec, uvec, 3) 2.510688 6 > AccRej(ccvec, rvec, uvec, 7) -0.2772469 10 > AccRej(ccvec, rvec, uvec, 11) -0.0742707 12 ### Now let's do a lot of these at once in a for-loop > cnew <- numeric(1000) ind <- 0 for(i in 1:1000) { aux <- AccRej(ccvec, rvec, uvec, ind) cnew[i] <- aux[1] ind <- aux[2] } > ind/1000 [1] 1.538 > 1/.66 [1] 1.515152 > summary(cnew) Min. 1st Qu. Median Mean 3rd Qu. Max. -2.277 -0.6977 -0.2047 -0.07194 0.6893 2.511 ### One way to check normality is by means of a histogram > hist(cnew) ### Not bad, but not conclusive ### A more detailed way to create a `probability plot' ## is to plot the cumulative empirical d.f. of the (ordered) ## observations against the cumulative normal cdf values. > plot(pnorm(sort(cnew)), (1:1000)/1000, xlab="Normal cdf values", ylab="Empirical cdf values", type="b", main= "Probability Plot for Simulated Normal Values") abline(0,1) ## printgraph(file="ProbPlot.ps")