LOG TO SUMMARIZE SOME SIMULATION AND GOODNESS OF FIT STEPS ========================================================== ## START WITH SIMULATION & CHECKING > unix.time({tmpdat = array(rexp(2e5), c(200,1000))}) user system elapsed 0.08 0.00 0.12 breakpt = c(0,1,2,4,50) pvec = diff(pexp(breakpt)) > round(pvec,5) [1] 0.63212 0.23254 0.11702 0.01832 > nobs = hist(tmpdat[,1], breaks=breakpt, plot=F)$count > nobs [1] 118 56 24 2 > sum((nobs-200*pvec)^2/(200*pvec)) [1] 3.268488 > chisq.test(nobs, p=pvec) Chi-squared test for given probabilities data: nobs X-squared = 3.2685, df = 3, p-value = 0.3521 Warning message: In chisq.test(nobs, p = pvec) : Chi-squared approximation may be incorrect > ChiTsts = apply(tmpdat,2, function(dcol) chisq.test(hist(dcol, breaks=breakpt, plot=F)$count, p=pvec)$stat) ## Same warning message is repeated many times ! > length(ChiTsts) [1] 1000 > summary(ChiTsts) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.05315 1.15500 2.33500 2.92600 4.02500 16.17000 #### Compare to chi-sq 3df distribution with mean 3. ### For a graphical picture (scaled relative frequency ### histogram with overlaid chisq_3 density): > hist(ChiTsts, prob=T, nclass=40, main="Hist of Goodness of FitChisq vs Chisq_3") curve(dchisq(x,3), add=T, lty=3, col="red") ======================================================= ### How many times would we expect Chisq > 15 ? > 1000*(1-pchisq(15,3)) [1] [1] 1.816649 ### p-value .00182 each time > sum(ChiTsts > 15) [1] 2 ### Distribution of the random variate just produced ? ### Approximately Poisson(1.816649) ========================================================