Class Log on Accept-Reject for a Specific Example ================================================= 10/2/17 Example (posed by a student) f(x) = C*x^2/(1+x^2)^3, for x>0 I suggested at the end of class last time that upper-bounding such a function by a multiple of a density function easy to simulate from would often become much more tractable by making the bounding approximations on smaller intervals. For example, on the set {x: x>=4} the function x^2/(1+x^2) increases, from 16/17 = 0.9412 to 1. And that means that the density is accurately upper-bounded by a constant times 1/(1+x^2)^2, which can easily be checked to be the probability density of 1/sqrt(3) times a t_3 r.v. Also: the density function is monotone increasing from 0 to sqrt(1/2) which is the max point > f1 = function(x) x^2/(1+x^2)^3 > curve(f1(x), 0, 5) > abline(v=sqrt(1/2), col="red") > abline(v=4, col="red") > f1(sqrt(1/2)) [1] 0.1481481 > curve(f1(x),0,sqrt(1/2)) [you can solve for this analytically], and on that interval (1+x^2)^(-3) can be bounded by exp(-2.4*x^2). [I chose functions like exp(-c*x^2) because the first couple of Taylor's series terms of (1+x^2)^(-3) and exp(-3*x^2) are the same.] We can check that this bound works by a graphical check > curve(exp(2.4*x^2)/(1+x^2)^3,0,sqrt(1/2)) or a maximization: > optimize(function(x) exp(2.4*x^2)/(1+x^2)^(-3), c(0,sqrt(1/2))) $minimum [1] 6.186375e-05 $objective [1] 1 On the interval from sqrt(1/2) to 4, find a curve that decays almost exponentially > curve(log(f1(x)), sqrt(1/2),4) with slope roughly > (log(f1(4))-log(f1(1)))/3 [1] -1.21587 so we try the line passing through (1,log(f1(1))+.01) and (4,log(f1(4))+.3)which is a choice made by eye from > curve(log(f1(x))-(log(f1(1))+.01)-(x-1)*(log(f1(4))+.3- (log(f1(1))+.01))/3,sqrt(1/2),4) > optimize(function(x) log(f1(x))-log(f1(1))-.01-(x-1)*(log(f1(4))+0.3- log(f1(1))-.01)/3,c(sqrt(1/2),4),maximum=T) $maximum [1] 1.066937 $objective [1] -0.006168408 #### Now we put the three pieces together ### First find the integration constant Iconst = 1/integrate(f1,0,Inf)$val ## Now we upper-bound f1(x) ## by x^2*exp(-2.4*x^2) on (0,sqrt(1/2) ## ## by exp((log(f1(1))+.01)-(x-1)*(log(f1(4))+.3- ## (log(f1(1))+.01))/3) on (sqrt(1/2),4) ## ## by (17/16)/(1+x^2)^2 on (4,Inf) ## This says that the density Iconst*f1(x) is <= g(x) = ## Iconst*x^2*exp(-2.4*x^2) on (0,sqrt(1/2) ## Iconst*exp( (log(f1(1))+.01)-(x-1)*(log(f1(4))+.29- ## log(f1(1)))/3 ) on (sqrt(1/2),4) ## Iconst*( (17/16)/(1+x^2)^2 ) on (4,Inf) > curve(Iconst*f1(x), 0 , 5) gcrv = function(x) Iconst*ifelse(x <= sqrt(1/2), x^2*exp(-2.4*x^2), ifelse(x>=4, (17/16)/(1+x^2)^2, exp((log(f1(1))+.01)+(x-1)*(log(f1(4))+ .29-log(f1(1)))/3) ) ) curve(gcrv,0,5, add=T, col="blue") #### This shows how the g-curve WHICH IS NOT A DENSITY #### upper-bounds Iconst*f1(x) WHICH IS A DENSITY ### To find out the probability 1/c for the ACCEPT-REJECT to accept each generated x > cfac = integrate(gcrv,0,Inf) > cfac 1.110972 with absolute error < 9.1e-05 ### NEXT: how to generate r.v.'s with density g = gcrv/cfac : ### first find areas under each of the three pieces > garea1 = integrate(gcrv,0,sqrt(1/2), rel.tol=1.e-8)$val garea2 = integrate(gcrv,sqrt(1/2),4, rel.tol=1.e-8)$val garea3 = integrate(gcrv,4,Inf, rel.tol=1.e-8)$val > cfac = garea1+garea2+garea3 [1] 1.110953 > q1 = garea1/cfac; q2=garea2/cfac; q3=garea/cfac gprobs = cumsum(c(q1, q2, q3)) [1] 0.2766538 0.9764141 1.0000000 > fprobs = cumsum(Iconst*c(integrate(f1,0,sqrt(1/2), rel.tol=1.e-9)$val, integrate(f1,sqrt(1/2),4, rel.tol=1.e-9)$val, integrate(f1,4,Inf, rel.tol=1.e-9)$val)) [1] 0.2917914 0.9762119 1.0000000 ### These are the cdf cutoffs for the three parts ### Now we have to create a method for simulating from a CDF FCDF and ### inverse-cdf ICDF to achieve a r.v. conditioned to fall in [a,b], ### where ulow=FCDF(a) and uhi=FCDF(b) ### View g = gcrv/cfac as being the sum of densities g_j on (a_{j-1},a_j) ## To generate r.v. X from g_j conditional on being in (a_{j-1},a_j) ## using u_k = G(a_k) for all k, and ## G_j^(-1)(G_j(a_{j-1})+(U-u_{j-1})(G_j(a_j)-G_j(a_{j-1}))/(u_j-u_{j-1})) ## ## ie solve: (G_j(X)-G_j(a_{j-1}))/(G_j(a_j)-G_j(a_{j-1}) = (u-u_{j-1})/(u_j-u_{j-1}) when u_{j-1} gsim = function(N) { uinp = runif(N) ind1 = (uinp <= gprobs[1]) ind3 = (uinp >= gprobs[2]) ind2 = !ind1 & !ind3 Xout = numeric(N) ulo = 0; uhi = pgamma(2.4*sqrt(1/2)^2,1.5,1) Xout[ind1] = sqrt(qgamma(uhi*uinp[ind1]/gprobs[1],1.5,1)/2.4) ulo = pexp(sqrt(1/2),lam); uhi = pexp(4,lam) Xout[ind2] = qexp(ulo+(uhi-ulo)*(uinp[ind2]-gprobs[1])/ (gprobs[2]-gprobs[1]),lam) ulo = pt(4*sqrt(3),3); uhi = 1 if(sum(ind3)) Xout[ind3] = qt(ulo+(1-ulo)* (uinp[ind3]-gprobs[2])/(1-gprobs[2]),3)/sqrt(3) Xout } > set.seed(1777) tmpX = gsim(1e6) ### a few seconds > mean(tmpX < sqrt(1/2)) [1] 0.276737 > mean(tmpX > 4) [1] 0.023433 ### OK, these are as expected under g > hist(tmpX[tmpX<= sqrt(1/2)], nclass=30, prob=T) caux = 1/integrate(function(x) x^2*exp(-2.4*x^2),0,sqrt(1/2))$val curve(caux*x^2*exp(-2.4*x^2),0,sqrt(1/2), add=T, col="blue") #### OK > hist(tmpX[tmpX> sqrt(1/2) & tmpX <4], nclass=30, prob=T) curve(dexp(x,lam)/(pexp(4,lam)-pexp(sqrt(1/2),lam)), sqrt(1/2), 4, add=T, col="orange") ### OK > hist(tmpX[tmpX>=4], nclass=60, prob=T, xlim=c(0,20)) curve(dt(x*sqrt(3),3)/(sqrt(1/3)*(1-pt(4*sqrt(3),3))), 4, 20, add=T, col="orange") ### OK