FILE to Summarize Background Info and Survey Data for Extra Stat 440 "Complex Survey" Problem, Fall 2010 ====================================================== NOTES at the end of the file show how the contents were generated or simulated in R. BACKGROUND There are 50 counties indexed 1..50, with adult populations given as follows (in multiples of 10,000): > CtyPop [1] 6.398 17.053 20.363 9.335 4.996 7.654 8.107 12.112 8.552 5.798 [11] 6.427 13.259 8.134 7.322 5.547 13.812 16.581 24.801 7.097 7.093 [21] 4.533 10.266 4.795 7.052 7.045 9.689 6.469 12.097 7.000 9.756 [31] 17.143 25.699 9.969 4.412 4.398 6.854 11.205 9.587 12.748 6.335 [41] 6.707 9.184 4.717 9.441 6.013 11.395 6.027 14.159 8.091 5.024 The counties indexed 1..25 if selected are used to draw stratified samples of 50 men and 50 women, so we need to have the information of the true numbers of men and women. Since we already have above the total men+women population, we supply the proportion of men among adults in each of the first 25 indexed counties: > MaleFrac[1:25] [1] 0.4812 0.4779 0.4915 0.5016 0.5170 0.5183 0.4715 0.4614 0.4975 0.4692 [11] 0.5182 0.4855 0.4923 0.4875 0.4935 0.4652 0.5188 0.5096 0.4618 0.5114 [21] 0.4894 0.4731 0.5107 0.5047 0.4943 SURVEY DATA Selected set of 8 county indices: > CtySamp [1] 32 2 25 31 47 8 38 13 Further samples of 100 are drawn: in counties 1..25 stratified by sex, in the other counties without regard to sex. The sample mean and variance information in all the sampled strata are summarized as follows: > Samp.Dat Cty CtyPop CtyMen CtyWom MenAvInc WomAvInc AvInc SVar.M SVar.W SVar 2 17.053 8.150 8.903 7.061 4.565 5.813 14.002 6.343 11.643 8 12.112 5.588 6.524 9.888 6.228 8.058 25.747 17.089 24.584 13 8.134 4.004 4.130 7.825 4.244 6.034 18.205 5.877 15.158 25 7.045 3.482 3.563 9.589 5.841 7.715 25.450 13.199 22.677 31 17.143 8.100 9.043 8.161 3.158 5.409 12.775 5.981 15.197 32 25.699 12.806 12.893 8.823 5.142 6.983 21.210 11.005 19.367 38 9.587 4.517 5.070 6.858 4.430 5.401 5.859 8.050 8.534 47 6.027 2.994 3.033 8.086 5.951 7.125 17.894 13.697 16.987 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> R NOTES > CtyPop = round(4+36*rbeta(50,1,4),3) > CtySamp = sample(1:50,8,rep=T,prob=CtyPop/sum(CtyPop)) > MaleFrac = round(runif(50,.46, .52),4) > CtyM.alph = rnorm(8,4,.4) > CtyW.alph = rnorm(8,2.5,.3) > Cty.beta = rnorm(8,1,.1) > Samp.Dat = array(0, c(8,9), dimnames=list(c(2,8,13,25,31,32,38,47), c("CtyPop", "CtyMen", "CtyWom", "MenAvInc", "WomAvInc", "AvInc", "SVar.M", "SVar.W","SVar"))) > sind = as.numeric(dimnames(Samp.Dat)[[1]]) Samp.Dat[,1:3] = CtyPop[sind]*cbind(rep(1,8), MaleFrac[sind], 1-MaleFrac[sind]) for (i in 1:4) { ## draw 50 men Incomes, 50 women samp = c(2*rgamma(50,CtyM.alph[i],Cty.beta[i]), 2*rgamma(50,CtyW.alph[i],Cty.beta[i])) Samp.Dat[i,4:9] = c(mean(samp[1:50]), mean(samp[51:100]), mean(samp), var(samp[1:50]), var(samp[51:100]),var(samp)) } for(i in 5:8) { ## draw 100 Incomes arranged as first men then women n.men = rbinom(1,100,MaleFrac[sind[i]]) samp = c(2*rgamma(n.men,CtyM.alph[i],Cty.beta[i]), 2*rgamma(100-n.men,CtyW.alph[i],Cty.beta[i])) Samp.Dat[i,4:9] = c(mean(samp[1:n.men]), mean(samp[(n.men+1):100]), mean(samp), var(samp[1:n.men]), var(samp[(n.men+1):100]),var(samp)) } > Samp.Dat = round(Samp.Dat,3)