TEST SOLUTIONS, STAT 440 11/7/05 ------------------------ #1. Ybar = .4*5+.4*9+.2*16 = 8.8, (a) SSB/N = .4*(5-8.8)^2 + .4*(9-8.8)^2 + .3*(17-8.8)^2 = 16.16 SSW/N = .4*3 + .4*6 + .2*12 = 6, so SST/N = Sy2 = 22.16 (b) MSE= (1/100)*22.16*(1-.5^2) ### = 0.1662 for regression estimator Then CV = sqrt(MSE)/Ybar = sqrt(.1662)/8.8 = .0463 #2. > lg2 <- (agsrs$FARMS87 >= 500) > sum(agsrs$ACRES87[lg2]) [1] 52873455 > sum(agsrs$ACRES87[lg2]^2) [1] 2.735729e+13 NB A brief argument (general for small-domain estimation of ratios) shows that the regression estimation is identical to the ratio estimator. (This will be done in class.) (a)-(b) Ratio estimator = Regression estimator = 52873455/175 = 302134 SE = (3078/1800)*(sqrt((1-300/3078)* (2.735729e+13 - 52873455^2/175)/(299*300)) = 18299.91 So CI = 302134 + c(-1,1)*1.96*18299.9 = (266266.2, 338001.8) NB The corresponding estimator based on plain SRS estimation of the total is : 52873455*(3078/300)/1800 = 301378.7 and its (larger!) SE is: (3078/1800)*sqrt((1-300/3078)* (2.735729e+13 - 52873455^2/300)/(299*300)) = 23037.39 #3. We have s_i^2 = (1/2)*d_i^2, so that the 2-stage cluster sampling MSE formula (for total, then divided by N^2) is: (1/150)* (3.6e9 *(1-150/10000) + (1/10000)*(16/2)*(1-2/4)* (0.5)*5.4e11 ) > (1/150)* (3.6e9 *(1-150/10000)) [1] 23640000 > (1/150)*(1/10000)*(16/2)*(1-2/4)*0.5*5.4e11 [1] 720000 ## Thus the two terms are respectively 2.364e7 + 0.72e6 = 2.436e7 So the CI = > 3.8e6*2/150 + 1.96*c(-1,1)*sqrt(2.436e7) [1] 40992.9 60340.4 #4. (a) Formula for optimal nh's is: > nh <- 2000*c(1000,3000,5000)*sqrt(c(.01,.05,.12)*(1-c(.01,.05,.12)))/ (sqrt(c(40,20,10))*sum(c(1000,3000,5000)*sqrt(c(.01,.05,.12)* (1-c(.01,.05,.12))*c(40,20,10)))) > nh [1] 3.620154 33.642827 118.233730 ## So we sample respectively 4,34,118 in the three strata. > nh <- round(nh) ## With this number sampled, the precision for ybar is: > SEybar <- sqrt(sum(c(1/9,3/9,5/9)^2*(1-nh/c(1000,3000,5000))* c(.01,.05,.12)*(1-c(.01,.05,.12))/nh)) [1] 0.02139862 (b) The overall proportion is: p <-(.01*1+.05*3+.12*5)/9 = 0.08444. Now we match sqrt((1-n/9000)*p*(1-p)/n) = 0.021594, or > n <- 1/((0.02139862^2/(p*(1-p))+1/9000)) ### = 165.7, or 166. ## Roughly, sample proportions 1/9, 3/9, 5/9 in the three strata, giving ## overall cost: > sum(c(1,3,5)*(166/9)*c(40,20,10)) [1] 2766.667 ### Thus the cost becomes considerably larger with SRS!