Setting Up the Hazard and Survival Curve Estimation in the
`Hypothetical' Survival Data Example of Problem 4.7
=================================================== originally done 10/13/03
UPDATED in 2011 and 2021
> Fram4.7 = cbind.data.frame(Entr.age =
c(58,58,59,60,60,61,61,62,62,62,63,63,64,66,66,67,67,67,68,69,
69,69,70,70,70,71,72,72,73,73), Exit.age =
c(60,63,69,62,65,72,69,73,66,65,68,74,71,68,69,70,77,69,72,79,
72,70,76,71,78,79,76,73,80,74), Dth=c(1,1,0,1,1,0,0,0,1,1,1,
0,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,1,0,1))
### This had to be entered by hand: it is not on the book webpage.
## Now we need to create a data-frame consisting of unique death-times, deaths, and
## numbers-at-risk for the LEFT-TRUNCATED dataset where each individual is
## left-truncated at "Entry Age" and dies or is censored at "Exit age".
## We are treating discrete-time survival here, since we have no information about
# exposure-times within years of age. So we act as though entry and exit, including \
# death, can occur only on an individual's birthday in a given year of age.
> ages4.7 = sort(unique(Fram4.7[Fram4.7$Dth==1,2]))
dths4.7 = tapply(rep(1,sum(Fram4.7$Dth==1)),
Fram4.7[Fram4.7$Dth==1,2],sum)
atrsk4.7T = numeric(length(ages4.7))
> for(i in 1:length(ages4.7)) atrsk4.7T[i] = sum(Fram4.7[,2] >=
ages4.7[i] & Fram4.7[,1] < ages4.7[i])
## Now, for example, an analogue of the Nelson-Aalen estimator is obtained by cumulatively
summing up dths/atrsk, and the standard error formula is essentially as before.
> Fram4.7T = cbind.data.frame(ages = ages4.7, dths = dths4.7,
atrsk = atrsk4.7T, CumHaz = round(cumsum(dths4.7/atrsk4.7T),3),
Haz.SE = round(sqrt(cumsum(dths4.7/atrsk4.7T^2)),3))
### You can verify this by hand !!
> Fram4.7T
ages dths atrsk CumHaz Haz.SE
60 60 1 3 0.333 0.333
62 62 1 6 0.500 0.373
63 63 1 8 0.625 0.393
65 65 2 10 0.825 0.418
66 66 1 8 0.950 0.436
68 68 2 12 1.117 0.452
69 69 2 11 1.298 0.470
70 70 2 10 1.498 0.490
71 71 2 11 1.680 0.507
72 72 2 10 1.880 0.526
73 73 1 9 1.991 0.538
74 74 1 9 2.103 0.549
76 76 1 7 2.245 0.568
77 77 1 5 2.445 0.602
### Now can estimate survival function directly for the increasing ages as given in Fram4.7T.
> Scurv = cumprod(1- Fram4.7T$dths/Fram4.7T$atrsk)
> round(Scurv,4)
[1] 0.6667 0.5556 0.4861 0.3889 0.3403 0.2836 0.2320 0.1856 0.1519 0.1215 0.1080 0.0960
[13] 0.0823 0.0658
##------------------ Material added in Fall 2021, 11/10/21
## The "survival package" CAN handle left-truncated right-censored survival data directly,
# through functions "Surv" and "survdiff", by using entry and exit times and the
# type="counting" argument in Surv, as follows.
> library(survival)
Surv(Fram4.7$Entr.age, Fram4.7$Exit.age, Fram4.7$Dth, type="counting")
[1] (58,60] (58,63] (59,69+] (60,62] (60,65] (61,72+] (61,69+] (62,73+] (62,66]
[10] (62,65] (63,68] (63,74+] (64,71] (66,68] (66,69] (67,70] (67,77] (67,69]
[19] (68,72] (69,79+] (69,72] (69,70] (70,76+] (70,71] (70,78+] (71,79+] (72,76]
[28] (72,73] (73,80+] (73,74]
> tmpsurv = survfit(Surv(Entr.age, Exit.age, Dth, type="counting")~1, data=Fram4.7)
round(tmpsurv$surv,4)
[1] 0.6667 0.5556 0.4861 0.3889 0.3403 0.2836 0.2320 0.1856 0.1519 0.1215 0.1080 0.0960
[13] 0.0823 0.0658 0.0658 0.0658 0.0658
### This agrees precisely with "SCurv" computed above, up to last death-time
# We can also check that the left-truncated life-table constructed above agrees precisely
# with the one calculated by survfit:
cbind(time=tmpsurv$time, atrsk=tmpsurv$n.risk, dth=tmpsurv$n.ev, sprob=round(tmpsurv$surv,4))
time atrsk dth sprob
[1,] 60 3 1 0.6667
[2,] 62 6 1 0.5556
[3,] 63 8 1 0.4861
[4,] 65 10 2 0.3889
[5,] 66 8 1 0.3403
[6,] 68 12 2 0.2836
[7,] 69 11 2 0.2320
[8,] 70 10 2 0.1856
[9,] 71 11 2 0.1519
[10,] 72 10 2 0.1215
[11,] 73 9 1 0.1080
[12,] 74 9 1 0.0960
[13,] 76 7 1 0.0823
[14,] 77 5 1 0.0658
[15,] 78 4 0 0.0658
[16,] 79 3 0 0.0658
[17,] 80 1 0 0.0658