Setting Up the Hazard and Survival Curve Estimation in the `Hypothetical' Survival Data Example of Problem 4.7 =================================================== originally done 10/13/03 UPDATED in 2011 and 2021 > Fram4.7 = cbind.data.frame(Entr.age = c(58,58,59,60,60,61,61,62,62,62,63,63,64,66,66,67,67,67,68,69, 69,69,70,70,70,71,72,72,73,73), Exit.age = c(60,63,69,62,65,72,69,73,66,65,68,74,71,68,69,70,77,69,72,79, 72,70,76,71,78,79,76,73,80,74), Dth=c(1,1,0,1,1,0,0,0,1,1,1, 0,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,1,0,1)) ### This had to be entered by hand: it is not on the book webpage. ## Now we need to create a data-frame consisting of unique death-times, deaths, and ## numbers-at-risk for the LEFT-TRUNCATED dataset where each individual is ## left-truncated at "Entry Age" and dies or is censored at "Exit age". ## We are treating discrete-time survival here, since we have no information about # exposure-times within years of age. So we act as though entry and exit, including \ # death, can occur only on an individual's birthday in a given year of age. > ages4.7 = sort(unique(Fram4.7[Fram4.7\$Dth==1,2])) dths4.7 = tapply(rep(1,sum(Fram4.7\$Dth==1)), Fram4.7[Fram4.7\$Dth==1,2],sum) atrsk4.7T = numeric(length(ages4.7)) > for(i in 1:length(ages4.7)) atrsk4.7T[i] = sum(Fram4.7[,2] >= ages4.7[i] & Fram4.7[,1] < ages4.7[i]) ## Now, for example, an analogue of the Nelson-Aalen estimator is obtained by cumulatively summing up dths/atrsk, and the standard error formula is essentially as before. > Fram4.7T = cbind.data.frame(ages = ages4.7, dths = dths4.7, atrsk = atrsk4.7T, CumHaz = round(cumsum(dths4.7/atrsk4.7T),3), Haz.SE = round(sqrt(cumsum(dths4.7/atrsk4.7T^2)),3)) ### You can verify this by hand !! > Fram4.7T ages dths atrsk CumHaz Haz.SE 60 60 1 3 0.333 0.333 62 62 1 6 0.500 0.373 63 63 1 8 0.625 0.393 65 65 2 10 0.825 0.418 66 66 1 8 0.950 0.436 68 68 2 12 1.117 0.452 69 69 2 11 1.298 0.470 70 70 2 10 1.498 0.490 71 71 2 11 1.680 0.507 72 72 2 10 1.880 0.526 73 73 1 9 1.991 0.538 74 74 1 9 2.103 0.549 76 76 1 7 2.245 0.568 77 77 1 5 2.445 0.602 ### Now can estimate survival function directly for the increasing ages as given in Fram4.7T. > Scurv = cumprod(1- Fram4.7T\$dths/Fram4.7T\$atrsk) > round(Scurv,4) [1] 0.6667 0.5556 0.4861 0.3889 0.3403 0.2836 0.2320 0.1856 0.1519 0.1215 0.1080 0.0960 [13] 0.0823 0.0658 ##------------------ Material added in Fall 2021, 11/10/21 ## The "survival package" CAN handle left-truncated right-censored survival data directly, # through functions "Surv" and "survdiff", by using entry and exit times and the # type="counting" argument in Surv, as follows. > library(survival) Surv(Fram4.7\$Entr.age, Fram4.7\$Exit.age, Fram4.7\$Dth, type="counting") [1] (58,60] (58,63] (59,69+] (60,62] (60,65] (61,72+] (61,69+] (62,73+] (62,66] [10] (62,65] (63,68] (63,74+] (64,71] (66,68] (66,69] (67,70] (67,77] (67,69] [19] (68,72] (69,79+] (69,72] (69,70] (70,76+] (70,71] (70,78+] (71,79+] (72,76] [28] (72,73] (73,80+] (73,74] > tmpsurv = survfit(Surv(Entr.age, Exit.age, Dth, type="counting")~1, data=Fram4.7) round(tmpsurv\$surv,4) [1] 0.6667 0.5556 0.4861 0.3889 0.3403 0.2836 0.2320 0.1856 0.1519 0.1215 0.1080 0.0960 [13] 0.0823 0.0658 0.0658 0.0658 0.0658 ### This agrees precisely with "SCurv" computed above, up to last death-time # We can also check that the left-truncated life-table constructed above agrees precisely # with the one calculated by survfit: cbind(time=tmpsurv\$time, atrsk=tmpsurv\$n.risk, dth=tmpsurv\$n.ev, sprob=round(tmpsurv\$surv,4)) time atrsk dth sprob [1,] 60 3 1 0.6667 [2,] 62 6 1 0.5556 [3,] 63 8 1 0.4861 [4,] 65 10 2 0.3889 [5,] 66 8 1 0.3403 [6,] 68 12 2 0.2836 [7,] 69 11 2 0.2320 [8,] 70 10 2 0.1856 [9,] 71 11 2 0.1519 [10,] 72 10 2 0.1215 [11,] 73 9 1 0.1080 [12,] 74 9 1 0.0960 [13,] 76 7 1 0.0823 [14,] 77 5 1 0.0658 [15,] 78 4 0 0.0658 [16,] 79 3 0 0.0658 [17,] 80 1 0 0.0658