Example of Nelson-Aalen and Kaplan-Meier Calculations in R in Leukemia Data as in Tables 4.1A,B. ================================================ NB. R has a powerful functions "survfit" to calculate Kaplan-Meier estimates, but here we do the calculations "by hand", ie from scratch in R, and compare them with the survfit results to make sure we know what has been computed ... Data downloaded to Leukem.dat as text by going to the text's web-site associated with Table 1.1 in Section 1.2. 1st column is ID of patient pair, (one randomized to 6MP, one to placebo 2nd column is initial Remission Type (1=partial, 2=complete) 3rd and 4th columns time to relapse resp. for Placebo and 6MP patients 5th column is status = indicator of relapse for 6MP patient (if no relapse, then end-of-study censoring; all placebo patients relapsed) > library(KMsurv) > data(drug6mp) > Leuk = drug6mp[,-1] names(Leuk) = c("RemTyp","Pcb.tim","6MP.tim","6MP.Rlps") >Leuk RemTyp Pcb.tim 6MP.tim 6MP.Rlps 1 1 1 10 1 2 2 22 7 1 3 2 3 32 0 4 2 12 23 1 5 2 8 22 1 6 1 17 6 1 7 2 2 16 1 8 2 11 34 0 9 2 8 32 0 10 2 12 25 0 11 2 2 11 0 12 1 5 20 0 13 2 4 19 0 14 2 15 6 1 15 2 8 17 0 16 1 23 35 0 17 1 5 6 1 18 2 11 13 1 19 2 4 9 0 20 2 1 6 0 21 2 8 10 0 ## Computational steps for new data-frame which will contain ordered distinct event-times, ## numbers of "dths" and number at risk, plus Nelson-Aalen and Kaplan-Meier and (Tsiatis- ## instead of Greenwood-) form of Variance > perm = order(Leuk[,3]) tmpfram = Leuk[perm,c(1,3:4)] #### So far, have the Status, 6MP.tim, 6MP.Rlps columns reordered in increating 6MP.tim > nodup = !duplicated(tmpfram[,2]) ### Boolean vector indicating first in each set of tied death-times > ntim = sum(nodup) ### 16 distinct event-times, only 7 of which turn out to be "death-times" > atrsk <- 21:1 dths <- rev(cumsum(rev(tmpfram[,3]))) ### take cumulative sum of observed deaths from bottom to top of column > newfram <- cbind.data.frame(times=tmpfram[nodup,2], dths=rev(diff(rev(c(dths[nodup],0)))), atrsk=atrsk[nodup], CumHaz=rep(0,ntim), Var.Haz=rep(0,ntim), SKM=rep(1,ntim), Var.SKM=rep(0,ntim)) ### now the dths column has the number of observed deaths at each of the distinct event-times. ## NB the censored times are viewed as occurring just after deaths. ## NOW COME THE NELSON-AALEN AND KAPLAN-MEIER CALCULATIONS newfram\$CumHaz <- cumsum(newfram\$dths/newfram\$atrsk) newfram\$SKM <- cumprod(1-newfram\$dths/newfram\$atrsk) newfram\$Var.Haz <- cumsum(newfram\$dths/newfram\$atrsk^2) newfram\$Var.SKM <- cumsum(newfram\$dths/(newfram\$atrsk* (newfram\$atrsk+1.e-5-newfram\$dths)))*newfram\$SKM^2 > round(data.matrix(newfram[newfram\$dths>0,]),4) times dths atrsk CumHaz Var.Haz SKM Var.SKM 1 6 3 21 0.1429 0.0068 0.8571 0.0058 2 7 1 17 0.2017 0.0103 0.8067 0.0076 4 10 1 15 0.2683 0.0147 0.7529 0.0093 6 13 1 12 0.3517 0.0217 0.6902 0.0114 7 16 1 11 0.4426 0.0299 0.6275 0.0130 11 22 1 7 0.5854 0.0503 0.5378 0.0164 12 23 1 6 0.7521 0.0781 0.4482 0.0181 ### The four final columns are constant between successive death-times. (Think of ## all four as being right-continuous with jumps only at death-times.) ### The powerful R functions for doing KM directly are illustrated as follows: > library(survival) > Surv(Leuk[,3],Leuk[,4]) [1] 10 7 32+ 23 22 6 16 34+ 32+ 25+ 11+ 20+ 19+ 6 17+ 35+ 6 13 9+ 6+ 10+ ### Note that + means censored > tf = survfit(Surv(Leuk[,3],Leuk[,4])~1) round(tf\$time,3) [1] 6 7 9 10 11 13 16 17 19 20 22 23 25 32 34 35 round(tf\$surv,4) [1] 0.8571 0.8067 0.8067 0.7529 0.7529 0.6902 0.6275 0.6275 [9] 0.6275 0.6275 0.5378 0.4482 0.4482 0.4482 0.4482 0.4482 ### Now we check that the std.err component agrees exactly ### with the Greenwood formula variance for the cumulative-Hazard estimator: > cbind( std.err = round(tf\$std.err[tf\$n.ev>0],6), Greenwd.SE = round(sqrt(cumsum(newfram\$dths/(newfram\$atrsk* (newfram\$atrsk-newfram\$dths))))[newfram\$dths>0],6), Tsiatis= round(sqrt(newfram\$Var.Haz)[newfram\$dths>0],6)) std.err Greenwd.SE Tsiatis [1,] 0.089087 0.089087 0.082479 [2,] 0.107764 0.107764 0.101306 [3,] 0.127964 0.127964 0.121274 [4,] 0.154760 0.154760 0.147146 [5,] 0.181773 0.181773 0.172963 [6,] 0.238435 0.238435 0.224331 [7,] 0.300307 0.300307 0.279468