Solutions to Sample Final Problems 1--9. ======================================== (See below also for solutions to Fall 2007 Supplement of 4 sample problems.) ----------------------------------------- (1) Ignoring fpc, and using M=4, K=4N, design effect = (N^2/n) S_t^2/((4N)^2 S_y^2/(4n)) which after equating fpc's and terms like (N/(N-1)) to 1, is equal to 4 * SSB /SST = 2.8 (2) We find the average of the 10 groups' total monthly rentals = $96000, and the standard deviation = sqrt((9.4464e10 - 10(9.6e4)^2)/9) = 1.6e4. Then the overall average-estimator \bar{\tau}/80 has observed value 1200, mean \bar{Y} and standard error estimated by sqrt(1/10)*(1/80)*1.6e4 = 63.25. So the CI for \bar{Y} is 1200 + 1.96*c(-1,1)*63.25 = (1076.03, 1323.97). (3) In this problem, do not ignore fpc's but ignore factor M/(M-1) = 1000/999. In both parts: \hat{t}_i = 1000* block mean given in Table, and the sample variance estimator s_t^2 = 1000^2 * var(c(15.2, 13.5, 11.7, 14.4)) = 2.26e6 Part (a): estimator \bar{y}_{unb} = (15.2+13.5+11.7+14.4)/4 = 13.7 while SE = 1e-4*sqrt((10^2/4)*(1-4/10)*2.26e6 + (10/4)*(1000^2/20)*(1-20/1000)*(10.2+8.6+13.3+11.1)) = 0.626 ## which is a reasonable value Part (b): The unbiased estimator for SSB in this problem is: (9/1000)*(s_t^2 - (1/4)*(1000^2/20)*(1-20/1000)*(10.2+8.6+13.3+11.1)) = 15577.2 The unbiased estimator for SSW is (999/4)*10*(10.2+8.6+13.3+11.1) = 107982 So the solution is to estimate SST/(1e4-1) = (15577+107982)/1e4 = 12.356 showing that only a little more than 1/8 of the overall variance in Y is due to between stratum variance. (4) Here all units within each stratum have the same inclusion probability, resp. c((35/2500)*2/4, (50/4500)*1/4, 20/3000) = c(0.007, 0.00278, 0.00667). So the HT estimate is 101500/(.007) + 118750/(.00278) + 248000/(.00667) = 94397237 . [Compare (40000/200)*(101500+118750+248000) = 93650000 which is an SRS-looking estimator, just to check the order of magnitude is right.] (5) > nh <- c(.23, .24, .25, .28)*c(4200, 3000, 1900, 2400) > round(nh*1000/sum(nh)) [1] 341 254 168 237 > sum(c(.23, .24, .25, .28)^2*c(4200, 3000, 1900, 2400)^2 * (1-nh/(80000*c(.23,.24,.25,.28)))/nh) [1] 2723.844 ### sqrt of this is the SE (6) Ignoring fpc, want 2.576*sqrt(.55*.45/n) <= .05, or n=657. (7) > sum(c(5,4,2,3,2,1)*(0:5)) ## 30 > (sum(c(5,4,2,3,2,1)*(0:5)^2) - 17*(30/17)^2)/16 ## 2.691 (a) Estimator is (175/17)*30 = 308.82, with estimated SE = sqrt((175^2/17)*(1-17/175)*2.691) = 66.16 (b) Estimator is regression estimator based on x_i= I[at least 1 cav], y_i=#cav. Then the regression estimators are \hat{B}_1 = (30 - 17*(30/17)*(12/17))/(12 - 17*(12/17)^2) = 2.5 \hat{B}_0 = 30/17 - 2.5*(12/17) = 0 and \hat{t}_{y,reg} = \hat{B}_1 *120 = 2.5*120 = 300 with estimated SE = sqrt((175^2/17)*(1-17/175)* (sum((0:5 - c(0, rep(1,5))*2.5)^2*c(5,4,2,3,2,1))/16)) = 46.203 . (8) NOTE that this is a sampling design with random sample size ! We know that the inclusion prob's pi_i are resp. .05, .15, .10 for individuals in the respective precincts, so the estimator is > (35/.05 + 60/.15 + 30/.10)/3000 ## = 1400/3000 = 46.7% ## So the CV is the rMSE/mean of the estimator, est'd by: > sqrt(35*.95/.05^2+60*.85/.15^2+30*.9/.10^2)/(3000*1400/3000) = .0965 (9) (a) > (3/2.088e-4 + 12/4.175e-4 + 25/5.219e-4 + 77/6.263e-4)/60 [1] 3565.942 ## var not available without data on sum ## of (HH #5yr+ persons)^2 Since we know that the sum of squares of numbers of persons in samples HH in town for 5+ yrs were respectively 3 , 18 , 45 , 251 we can calculate that the with-replacement SE is estimated by : sqrt((1/(60*59))*(3/(2.088e-4)^2+18/(4.175e-4)^2+45/(5.219e-4)^2+ 251/(6.263e-4)^2 - 60*3565.942^2)) = 246.0 ======================================================= Solutions to 4 Fall 2007 Supplementary Sample Problems ====================================================== #1. Estimator = (N_1*y1bar + N_2* y2bar)/(N_1*x1bar + N_2* x2bar) = (12+2*20)/(1+2*3) = 52/7 = Bhat Estimated Variance = (1/xbar_U)^2*((N_1/N)^2*Shat^2_{y-Bx,1}* (1/n_1-1/N_1) + (N_2/N)^2*Shat^2_{y-Bx,2}* (1/n_2-1/N_2)) witb Bhat substituted for B = (3/7)^2*((1/9)*(.98/200)*(2+(52/7)^2*.36-52/7)+(4/9)*(.985/300)* (4+(52/7)^2*.64-2*52/7)) = .00800 #2. As indicated in class, log(y1bar) is the estimator, with log(y1bar) - log(Ybar_{U1}) approx = (1/Ybar_{U1})*(y1bar-Ybar_{U1}) which means that the estimated variance is given by (1/y1bar)^2(1/n_1-1/N_1)*S^2_{y,1} = (1/12^2) *(.98/200)*2 = 6.8056e-5 #3. (a) Horvitz-Thompson estimator of total = sum_{i in s} y_i w_i = 1000*(14*1+6*4+13*8) = 142000 (b) Vhat = sum_{i,j in s} (pi_{ij}-pi_i *pi_j)*y_i*y_j/(pi_i*pi_j*pi_{ij}) = sum_{i in s} (1-pi_i)*y_i^2/pi_i^2 = 1000*(999*(25+4+9+16) + 3999*(4+9+1)+7999*(64+25)) = 82184300 = 28668^2 The Sen-Yates-Grundy formula with these pi_{ij} would give 0 ! But it does not apply, because with these pi_{ij} sum_{j: j ne i } pi_{ij} = pi_i*(sum_k pi_k - pi_i) ne pi_i*(n-1) #4. Estimated response rates for the 4 strata are respectively 4/5, 5/6, 3/4, 1/2. The adjusted weights are the original weights*r_i times the reciprocals of these class-estimated response rates, so the adjusted estimator in (a) becomes (5/4)*130 + (6/5)*170 + (4/3)*120 + 2*150 = 826.5. The same estimator could be used in (b), but under the adjustment class model with responders and nonresponders assumed to have the same average y-attribute values, we could also estimate in (b) by (130/1200)*1400 + (170/2500)*2400 + (120/1500)*1900 + (150/2000)*2500 = 654.4 which we can guess might be a little more accurate.