Self-Consistency for Kaplan Meier
=================================          4/30/08,  UPDATED 11/12/2021

Suppose that right-censored survival data are given in a 
Table, consisting of columns: time, Dth, Cens

time (column of ordered distinct even-times t_j)
Dth  (number d_j of observed failures at time t_j)
Cens (number c_j of observed right-censored at time t_j)

In addition, (as long as time is already ordered increasing) the at-risk count 
column  Risk  is defined to have value at t_j  equal to

sum_k I(t_k >= t_j) * (d_k+c_k)

which can be easily coded in R from the other columns as

Risk = rev(cumsum(rev(Dth + Cens)))

Now suppose that  Shat0  is a survival function estimator (given as a column, 
evaluated at the times t_j) which is piecewise constant, with jumps only at 
event times  t_j.  Then the self-consistency idea is to update  Shat0  to give 
a survival function  Shat1  by using Shat0 to attribute to each censored individual 
at  t_j  a conditional survival function  Shat0(t)/Shat0(t_j)  at all times  t_j. 
The definition of Shat1 from Shat0 is as follows:

nevt = length(time)
Shat1 = 1 - (1/Risk[1])*(cumsum(Dth) + 
                 Shat0*cumsum(Cens/c(1,Shat0[1:(nevt-1)])))

#  We code this into a function:

SlfConsUp = function(Shat0, LTable) {
### LTable should have columns time, Dth, Cens
   nevt = nrow(LTable)
   npop = sum(LTable[,2] + LTable[,3])
   Shat1 = 1 - (1/npop)*(cumsum(LTable[,2]+LTable[,3]) - Shat0*
        cumsum(LTable[,3]/Shat0))
   Shat1}

### Now consider life table from former "gehan" dataset, now called "drug6mp" in KMsurv

 library(KMsurv)
 data(drug6mp)
 tmp = drug6mp[,c("t2","relapse")]
### Note that there are exactly two times, t=10 and t=6, at which simultaneous
###   censoring and failures occur. So make the censoring time 0.5 later
> tmp[21,1] = 10.5
  tmp[20,1] = 6.5
  Gehan6MP = cbind(time=as.numeric(levels(factor(tmp$t2))), 
        Dth=aggregate(tmp$relapse, by=list(tmp$t2), sum)$x, 
       Cens=aggregate(1-tmp$relapse, by=list(tmp$t2), sum)$x)


> Gehan6MP          ### 21 at risk initially in 6-MP group
      time Dth Cens
 [1,]  6.0   3    0
 [2,]  6.5   0    1
 [3,]  7.0   1    0
 [4,]  9.0   0    1
 [5,] 10.0   1    0
 [6,] 10.5   0    1
 [7,] 11.0   0    1
 [8,] 13.0   1    0
 [9,] 16.0   1    0
[10,] 17.0   0    1
[11,] 19.0   0    1
[12,] 20.0   0    1
[13,] 22.0   1    0
[14,] 23.0   1    0
[15,] 25.0   0    1
[16,] 32.0   0    2
[17,] 34.0   0    1
[18,] 35.0   0    1

> Shat = 1 - cumsum(Gehan6MP[,2])/21
  round(Shat,4)
 [1] 0.8571 0.8571 0.8095 0.8095 0.7619 0.7619 0.7619 0.7143 0.6667 0.6667
[11] 0.6667 0.6667 0.6190 0.5714 0.5714 0.5714 0.5714 0.5714

> for (i in 1:10) {  Shat = SlfConsUp(Shat,Gehan6MP)
       cat(round(Shat,4),"\n\n") }

0.8571 0.8571 0.8067 0.8067 0.7529 0.7529 0.7529 0.6902 0.6275 0.6275 0.6275 
0.6275 0.5378 0.4482 0.4482 0.4482 0.4482 0.4482 
...
0.8571 0.8571 0.8067 0.8067 0.7529 0.7529 0.7529 0.6902 0.6275 0.6275 0.6275 
0.6275 0.5378 0.4482 0.4482 0.4482 0.4482 0.4482 
        ### Converged at  1 iteration to 4 decimal places !!!

> round(survfit(Surv(tmp$time,tmp$cens))$surv,4) 
 [1] 0.8571 0.8571 0.8067 0.8067 0.7529 0.7529 0.7529 0.6902 0.6275 0.6275
[11] 0.6275 0.6275 0.5378 0.4482 0.4482 0.4482 0.4482 0.4482

### Need 20 iterations to get agreement to 7 decimal places. Could check this 
#     with other datasets too !! We do it next with a randomly generated one:

set.seed(2112)
Xvec = rexp(25,.5)
Uvec = runif(25,1,5)
Tvec = pmin(Xvec,Uvec)
Dth = as.numeric(Xvec <= Uvec)
Tabl = cbind(time=Tvec, Dth= Dth, Cens=1-Dth)[order(Tvec),]

> Shat = 1-.04*cumsum(Dth)
  for(i in 1:20) Shat = SlfConsUp(Shat,Tabl)
  round(Shat,4)
 [1] 0.9600 0.9200 0.8800 0.8400 0.8000 0.7600 0.7200 0.6800 0.6400 0.6000 
[11] 0.5600 0.5600 0.5169 0.4738 0.4738 0.4738 0.4212 0.4212 0.4212 0.4212 
[21] 0.3370 0.2527 0.1685 0.1685 0.1685

> round(survfit(Surv(Tabl[,1], Tabl[,2])~1)$surv,4)
 [1] 0.9600 0.9200 0.8800 0.8400 0.8000 0.7600 0.7200 0.6800 0.6400 0.6000 
[11] 0.5600 0.5600 0.5169 0.4738 0.4738 0.4738 0.4212 0.4212 0.4212 0.4212 
[21] 0.3370 0.2527 0.1685 0.1685 0.1685

# NOTE number of iterations is of the order of number of death-times!

=================================================================

### Coding for Redistribute-to-the-Right Algorithm for Kaplan-Meier

## Recall that the idea is to work in a single pass, from left to right, 
#    through all of the censoring times, successively dividing the mass 
#    (initially 1/npop at all observations) at each such point equally 
#    among all observations with greater event times.

> Redistr = function(LTable) {
       nevt = nrow(LTable)
       Dth = LTable[,2]
       Cens = LTable[,3]
       nRisk = rev(cumsum(rev(Dth+Cens)))
       npop = nRisk[1]
       inds = (1:nevt)[Cens > 0 & 
           rev(cumsum(rev(Dth))) > 0]
       for(i in inds) 
         Dth[(i+1):nevt] = Dth[(i+1):nevt]*(
           1+Cens[i]/nRisk[i+1])
       1 - cumsum(Dth)/npop         }

> sum(abs(Redistr(Gehan6MP) - survfit(Surv(tmp$t2, tmp$relapse) ~ 1)$surv))
[1] 1.831868e-15           ### so the Redistribute algorithm gives KM!

> sum(abs(Redistr(Tabl) - survfit(Surv(Tabl[,"time"],Tabl[,"Dth"])~1)$surv))
[1] 1.665335e-15		### OK.