Rank-based testing of equal survivorship based on cross …faculty.washington.edu/kcgchan/papers/combined_logrank.pdf · floga (i)g n i=1 x. (2.3) The expectation is taken with respect

Biostatistics (2014), 0, 0, pp. 1–22doi:10.1093/biostatistics/combined˙logrank

Rank-based testing of equal survivorship based

on cross-sectional survival data with or without

prospective follow-up

KWUN CHUEN GARY CHAN∗

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA

[email protected]

JING QIN

Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda,

MD 20892, USA

[email protected]

Summary

Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-

up since all subjects are essentially censored. However, partial survival information are avail-

able from backward recurrence times, and are frequently collected from health surveys without

prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed

based only on backward recurrence times without any prospective follow-up. When follow-up data

are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-

up information from the same sample are shown to be asymptotically independent. We discuss

four ways to combine these two statistics when follow-up is present. Simulations show that all

∗To whom correspondence should be addressed.

c© The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]

2 K. C. G. Chan and J. Qin

combined statistics have substantially improved power compared to conventional rank statistics,

and a Mantel-Haenszel test performed the best among the proposal statistics. The method is

applied to a cross-sectional health survey without follow-up and a study of Alzheimer’s disease

with prospective follow-up.

Key words: Accelerated failure time model; Backward recurrence time; Length biased sampling.

1. Introduction

Cross-sectional sampling of survival data recruits individuals who have experienced a certain

initial event but not a failure event at the sampling time. It is considered a focused and economical

design for studying the natural history of disease (Wang, 1991) but is subject to selection bias.

In particular, subjects with a longer survival time are more likely to be sampled. Cross-sectional

data can be collected without prospective follow-up, in this case only a backward recurrence

time is collected as discussed in Cox (1962), Allison (1985) and Yamaguchi (2003) among others.

The estimation of the backward time distribution is one of the key elements in evaluating the

statistical accuracy of estimates of current HIV incidence rates from cross-sectional surveys.

Backward recurrence time is also frequently collected in health surveys.

In cross-sectional data with prospective follow-up in addition to the backward time, the data

structure allows one to apply statistical methods designed for left truncated and right censored

data. When disease incidence is stationary over time, cross-sectional survival data are length-

biased (Wang, 1991). For length-biased survival data, it has been shown that conventional meth-

ods for analyzing left truncated right censored data are inefficient and more efficient estimators

were proposed by Vardi (1989), Shen et al (2009), Ning et al (2011) and Chan et al. (2012) among

others.

In conventional survival analysis with right censored data, Prentice (1978) introduced a gen-

Testing of cross-sectional survival data 3

eral class of linear rank tests for a semiparametric accelerated failure time model. Unlike the fully

parametric approach which depends on the correctness of the underlying distributional assump-

tion, the rank test is robust to distributional misspecification. If the parametric assumption is

correct, then the rank test is fully efficient. On the other hand, the rank test can produce a valid

test even when the distributional assumption is misspecified. Ying (1990) extended linear rank

statistics to left truncated and right censored data which can be applied to cross-sectional survival

data. When the censoring proportion is high, the existing tests typically have low power. Recently,

Ning et al. (2010) developed a modified log-rank test for k-sample testing based on cross-sectional

survival data with prospective follow-up. However, the tests developed by Ying (1990) and Ning

et al. (2010) cannot be applied to cross-sectional survival data without prospective follow-up.

In this paper, we extend the work of Prentice (1978) and Ying (1990) to length-biased cross-

sectional data without follow-up based on backward recurrence time only, which can be optimally

combined with Ying’s rank test when prospective follow-up is present. As in Prentice (1978) and

Ying (1990), our primary focus is to test the null hypothesis H0 : β = 0 against two-sided

alternatives under a population accelerated failure time model

log Y = XTβ + ε (1.1)

where Y is the survival time of interest, X is a p-vector covariate, β is a p-vector coefficient and

ε follows an unspecified distribution with a density function fε and a survival function Fε. It is

assumed that X and ε are independent. The proposed statistics can be applied to cross-sectional

survival data without follow-up, whereas the log-rank statistic of Ning et al. (2010) cannot be

applied to that case. Moreover, the sampling distribution of their statistic is found by resampling

methods when censoring is present. In contrast, the asymptotic variance of the proposed test

statistic has a closed form consistent estimator and can be computed from existing software.

This will facilitate practical implementations of the proposed methods.

The rest of the paper is organized as follows. The proposed linear rank statistics for cross-


sectional data without follow-up are given in Section 2. When follow-up data are also available in

a cross-sectional sample, we can combine the linear rank statistics proposed in Section 2 and the

linear rank statistics based on follow-up data to improve power. Four ways to combine the two

statistics will be discussed in Section 3. Results from simulation studies are presented in Section

4; Section 5 contains an analysis of the National Comorbidity Survey Replication; Section 6

contains an analysis of the Canadian study of Health and Aging, and concluding remarks are

given in Section 7.

2. Linear rank statistics for cross-sectional data without follow-up

In a cross-sectional sample with complete follow-up, we could observe Y = A+V where A is the

time from an initial event to recruitment, also known as the backward recurrence time; and V

is the time from recruitment to a failure event, also known as the forward recurrence time. In a

cross-sectional sample without follow-up, only A is observed but not V or Y . In other words, every

individual is right censored. Under length-biased sampling, the backward time has a conditional

density (Cox, 1962)

fA(a|x) =FY (a|x)

µY (x), a > 0, µY (x) =

∫ ∞0

FY (a|x)da = µ exp(xTβ)

where FY (y|x) is the conditional survival function of Y given X = x, µY (x) = E(Y |X = x) and

µ =

∫ ∞−∞

Fε(a) exp(a) da =

∫ ∞0

fε(log a)da . (2.2)

The last equality in (2.2) is shown in Section A of the supplementary material. The density of

A given above is a consequence of length-biased sampling (Cox, 1962). In the context of cross-

sectional survival data, length-biasness of survival time requires a stationary disease incidence

assumption where the disease incidence in the population remains constant over time and inde-

pendent of (X,Y ) (Wang, 1991). This assumption is imposed for the estimation of β for model

(1.1) in Shen et al. (2009), Ning et al. (2011) among others and the test of Ning et al. (2010). We


derive the test statistic under this working assumption, since the derivation is tractable. How-

ever, we show in Section D of the supplementary material that the test is valid even for disease

incidence being non-stationary over time. Simulations are reported in Section 4 which indicates

that the test can be valid under misspecification of the stationary incidence assumption.

When the distribution of ε is known, parametric tests can be readily constructed for testing

H0 : β = 0, but their validity depend on the correct model specification. Alternative to parametric

tests, the linear rank test was proposed to offer a greater robustness to model misspecification.

A comprehensive review of linear rank statistics for right censored data can be found in Chapter

7 of Kalbfleisch and Prentice (2002). In the following we derive a linear rank test based on

backward recurrence time A. Let a(1) < . . . , a(n) be the ordered outcomes with corresponding

covariate vectors x(1), . . . , x(n), respectively. The rank vector is given by the corresponding labels

r = [(1), (2), . . . , (n)] and the rank likelihood (Kalbfleisch and Prentice, 1973) based on (ai, xi), i =

1, . . . , n is

LA = pr(a(1) < ... < a(n)|x1, . . . , xn) =

∫a(1)<...<a(n)

n∏i=1

FY {a(i)|x(i)}µY {x(i)}

da(i) ,

where a(1) < . . . < a(n) are ordered values of ai and x(i) are the corresponding covariate values.

Under model (1.1), one can write out the log-rank likelihood as

logLA = log

[∫a(1)<...<a(n)

n∏i=1

Fε{log a(i) − xT(i)β}da(1) . . . da(n)

]−

n∑i=1

xTi β − n logµ,

A score test statistic can be constructed as

∂ logLA∂β

∣∣∣∣β=0

=

n∑i=1

x(i)E

[fε{log a(i)}Fε{log a(i)}

]−

n∑i=1

xi. (2.3)

The expectation is taken with respect to the i-th order statistic generated from a distribution

with density Fε(log a)/µ. The derivation of (2.3) is shown in Section B of the supplementary

Material. Furthermore, since

n∑i=1

E

[fε(log a(i))

Fε(log a(i))

]=

n∑i=1

E

[fε(log ai)

Fε(log ai)

]= n

∫fε(log a)da/µ = n ,


the linear rank statistic can be written as T1 =∑ni=1 x(i)ci, where ci = E(φ(u(i))), φ(u) =

fε{log S−1(1−u)}[Fε{log S−1(1−u)}]−1−1 and S(a) =∫∞aFε(log(u)) du/µ, and u(1) < . . . < u(n)

are ordered statistics of uniform distributed random variables. Note that φ(u) is quite different

from φ(u) which is given in Chapter 7 of Kalbfleisch and Prentice (2002) for unbiased samples.

We consider two important special cases where ci and the associated rank tests have closed

form expressions, which lead to easier computations and well known test statistics. First, when

the error term ε is extreme-value distributed, it can be shown that S is the survival function of

an exponential distribution. Therefore, it follows from the same calculation in Prentice (1978)

that ci = n−1 + (n − 1)−1 + . . . + (n − i + 1)−1 − 1. Thus, it follows that TLR,1 =∑ni=1 x(i)ci

which is exactly the log-rank statistics treating backward recurrence times (a1, . . . , an) as if they

are completely observed survival times in an unbiased sample.

Another important special case is to construct a Wilcoxon statistics based on backward re-

currence times. Suppose the error distribution follows the G-rho family with ρ = 1/2 (Harrington

and Fleming, 1982), which has a survival function

Fε(ε) =1

(1 + eε/2)2. (2.4)

It follows that fε(ε) = eε/(1 + eε/2)3 and S(a) = (1 + a/2)−1. Therefore, φ = 2u − 1 and

ci = 2i(n + 1)−1 − 1. The corresponding statistic is TWil,1 =∑ni=1 x(i) (2i/(n+ 1)− 1) which

is exactly the Wilcoxon statistics treating backward recurrence times (a1, . . . , an) as if they are

completely observed survival times in an unbiased sample. In contrast to the construction of a

Wilcoxon statistic for an unbiased sample, which corresponds to a logistic error distribution, the

construction here requires the error distribution to have a lighter tail. It is because the distribu-

tion backward recurrence time, called the stationary distribution in renewal theory (Cox, 1962),

is generally having a heavier tail than the survival distribution. One exception is an exponential

survival distribution which has an exponential stationary distribution. This corresponds to an

extreme-valued error distribution, where the log-rank test can be constructed based on unbi-


ased survival data and backward recurrence times. Note that for a logistic distributed error, the

stationary distribution is undefined because µ is infinite.

The test statistics are asymptotically normal following from the standard theory of linear rank

estimation, provided that∫ 1

0φ2(u)du <∞ (Hajek et al., 1967; Prentice, 1978). Following their ar-

guments, a consistent variance estimate of TLR,1 is VLR,1 =∑ni=1{X(i)−(n−i+1)−1

∑nj=iX(j)}⊗2

and a consistent variance estimate of TWil,1 is VWil,1 =∑ni=1[i{(n + 1)}−1]2{X(i) − (n − i +

1)−1∑nj=iX(j)}⊗2, where a⊗2 = aaT for a column vector a. Therefore, Chi-square test statis-

tics can be constructed by χ2LR,1 = TTLR,1V

−1LR,1TLR,1 and χ2

Wil,1 = TTWil,1V−1Wil,1TWil,1 which are

asymptotically χ2(p) distributed under the null hypothesis.

In the literature, there are a few prominent ways for choosing between the log-rank and

Wilcoxon tests. One way is to specify the test statistic before analysis based on prior belief or

hypothesis about the possible alternative hypothesis. When proportional hazard alternative is

assumed, the log-rank test is optimal. When it is assumed that the hazard ratio decreases with

time, Wilcoxon test is usually more powerful. Another way is to use graphical diagnostics for

choosing the test statistics (Hess, 1995), and a related pre-test is recently proposed by Martinez

(2010). Note that the choice between log-rank and Wilcoxon tests only affects power, and will

only be useful for borderline cases. An advantage for rank-based statistics is their robustness

against misspecification of parametric error distribution. In fact, log-rank statistic is often used

in practice despite possible departure from proportional hazards alternative, since it provides a

conservative test in such cases.

In the above discussion, we constructed rank-based statistics based on backward recurrence

times (a1, . . . , an) observed in a cross-sectional sample as if they are completely observed survival

times in an unbiased sample. The result is seemingly paradoxical, and can be better understood

from the following. Cross-sectional sampling preferentially select longer survivors. For subgroups

with better survival, it is more likely to recruit individuals who lived longer since disease incidence.


Therefore, the backward time A also tends to be longer in those subgroups. It was discussed in

Allison (1985) that one can fit a proportional hazards model based on backward recurrence times

to estimate the relative hazard in the population survival model under a strong assumption

of exponential distributed failure time. The log-rank statistic is the corresponding score statistic

based on a partial likelihood function. For parametric accelerated failure time models, Yamaguchi

(2003) and Keiding et al. (2011) also showed that survival model parameters can be estimated

based on backward recurrence time. In particular, they showed that logA = XTβ + ε∗ for the

same regression coefficient β but a different error term ε∗ which has the survival function Fε∗(ε) =∫∞exp(ε)

Fε(log(u))du/µ. This suggests that testing the null hypothesis β = 0 from the accelerated

failure time model using backward recurrences times is equivalent to testing the null hypothesis

β = 0 in the target population. Therefore, linear rank statistics can also be constructed directly

backward recurrence times and the corresponding statistics coincide with our results.

3. Combined rank statistics for cross-sectional data with follow-up

When prospective follow-up is present, a right-censored version of the forward recurrence time V

would be observed in addition to the backward recurrence time A. Denote the observed length

biased sampling data with possible right censoring as {zi = min(ti, ci) = min(ai+vi, ai+ri), δi =

I(vi 6 ri), xi, i = 1, 2, ..., n}. Suppose V and R are conditionally independent given A and X

and the censoring distribution does not involve any parameter of the survival distribution. Note

that in length-biased sampling, A and Y are independent in the population. The observed data

likelihood is

L = LC × LM =

n∏i=1

{fY (zi|xi)FY (ai|xi)

}δi { FY (zi|xi)FY (ai|xi)

}1−δi×

n∏i=1

{FY (ai|xi)µY (xi)

}.

Based on the conditional likelihood function LC , linear rank statistics have been proposed

by Ying (1990). Based on the marginal likelihood function LM , linear rank statistics have been

proposed in the previous section. We will discuss how to combine the two statistics in this section.


Note that the backward recurrence time A and forward recurrence time V are correlated (Cox,

1962). Also, in the context of cross-sectional survival data, informative censoring occurs because

the survival time T = A + V and the censoring time C = A + R within a prevalent population

shares a common random component A. However, T and C are independent conditional on (A,X),

since R and V are independent conditional on (A,X). Because of this conditional independence

and the definition of risk sets for left truncated right censored data, Ying’s test does not suffer from

induced informative censoring. However, efficiency is lost because of conditioning. We improve

efficiencies by utilizing the information from the marginal distribution of A.

We introduce the following counting process notations for representing Ying’s statistics:

Ni(t) = I(zi 6 t, δi = 1), Ri(t) = I(zi > t > ai). When the error is extreme-value dis-

tributed, the locally most powerful test based on LC is the log-rank test TLR,2 =∑ni=1

∫ τ0{xi −

x(t)}dNi(t) , where x(t) = {∑ni=1 xiRi(t)}/{

∑ni=1Ri(t)} and τ is a constant corresponds to

the maximal observable survival time. When the distribution function of the error term is

(2.4), the locally most powerful test (Harrington and Fleming, 1982) is the G-rho test with

ρ = 1/2, TG−rho,2 =∑ni=1

∫ τ0

[S(t−)]1/2{xi − x(t)}dNi(t) where S(t−) is the left hand limit of

the product-limit estimator for left truncated right censored data (Tsai et al, 1987). The two

statistics have consistent variance estimates VLR,2 =∑ni=1

∫ τ0{xi− x(t)}⊗2dNi(t) and VG−rho,2 =∑n

i=1

∫ τ0S(t−){xi − x(t)}⊗2dNi(t) respectively. Chi-square test statistics can be constructed by

χ2LR,2 = TTLR,2VLR,2

−1TLR,2 and χ2G−rho,2 = TTG−rho,2VG−rho,2

−1TG−rho,2 which are asymptotically

χ2(p) distributed under the null hypothesis.

Since two Chi-square test statistics are constructed based on LM and LC respectively, it is

intuitive that a combined test statistic shall lead to an improvement of power if the two statis-

tics are not perfectly correlated. In fact, the two statistics are asymptotically independent as

we shall show. Power can be greatly improved as if we had two independent samples instead of

one. In the following we focus on combining two log-rank tests TLR,1 and TLR,2 under a work-


ing assumption that the error is extreme-value distributed. The combination of test statistics

TWil,1 and TG−rho,2 under a working error distribution (2.4) will follow a similar manner. We

first need to find out the correlations between TLR,1 and TLR,2 under the null hypothesis. Note

that TLR,1 is a function of only (ai, xi), i = 1, . . . , n, and E(TLR,2|a1, . . . , an, x1, . . . , xn) = 0

which will be shown in Section C of the supplementary material. Therefore, E(TLR,1TLR,2) =

E{TLR,1E(TLR,2|a1, . . . , an, x1, . . . , xn)} = 0, so that TLR,1 and TLR,2 are uncorrelated. More-

over, the two statistics have limiting Gaussian distributions and are asymptotically independent.

Following this result, we investigate several methods for combining the two log-rank statistics.

The first method is inspired by the method combining independent 2 × 2 tables in Mantel

and Haenszel (1959). Under H0, TLR,1 + TLR,2 is asymptotically normal with mean zero and

variance VLR,1 + VLR,2, and a Mantel-Haenszel (MH) statistic is defined as χ2LR,MH = (TLR,1 +

TLR,2)T (VLR,1 + VLR,1)−1(TLR,1 + TLR,2), which is asymptotically χ2(p) distributed.

The second method is a modification of χ2LR,MH. The covariance matrices VLR,1 and VLR,2

can be written as BLR,1BTLR,1 and BLR,2B

TLR,2 respectively, and B−1LR,1TLR,1 and B−1LR,2TLR,2 are

asymptotically normal with mean zero and variance Ip×p, the identity matrix. A modified Mantel-

Haenszel (MMH) statistic is defined as χ2LR,MMH = (B−1LR,1TLR,1 + B−1LR,2TLR,2)T (B−1LR,1TLR,1 +

B−1LR,2TLR,2), which is asymptotically χ2(p) distributed.

The third method is a sum of Chi-square test (Bhattacharya, 1961), defined as χ2SUM =

χ2LR,1 + χ2

LR,2, which is asymptotically χ2(2p) distributed.

The fourth method is the Fisher’s inverse Chi-square test statistics (Fisher, 1932, pp.99-101),

defined as χ2Fisher = −2 log pLR,1− 2 log pLR,2, where pLR,1 and pLR,2 are P-values based on χ2

LR,1

and χ2LR,2 respectively. The Fisher’s statistics is asymptotically χ2(4) distributed, regardless of

the dimensions of X. It is because the P-values are uniformly distributed under the null hypothesis

and each of the two terms in χ2Fisher is χ2(2) distributed.

There had been other procedures proposed for combining independent Chi-square tests, see


for example Marden (1982). We considered the above four procedures for the following reasons.

The Mantel-Haenszel test has been shown to outperform other procedures for combining 2 × 2

tables with similar departures from the null hypothesis for all tables (Louv and Littell, 1986).

In our case, the association parameter β is the same for the tests based on backward recurrence

time and follow-up time, suggesting that Mantel-Haenszel type procedures should have a decent

power. On the other hand, the sum of Chi-square and Fisher’s tests are shown to be admissible in

Marden (1982) and have reasonable power against a wide range of alternatives (Louv and Littell,

1986).

It is well known that the log-rank statistic has optimal power under proportional hazards al-

ternatives. For decreasing hazard ratios, the Wilcoxon test or the G-rho family is more powerful.

For right-censored data, Kosorok and Lin (1999) derived a versatile maximum test of weighted

log-rank statistics, which is powerful against a wide range of alternatives. We would like to stress

that our main contribution is rather different than the broad class of tests proposed for right-

censored data. We highlight the differences as follows. First, we combine asymptotic independent

test statistics from two sources (A and Y ) using the same set of data. This is different from

maximum tests that combine dependent test statistics from one source (Y only). For a combina-

tion of dependent test statistics, the gain in power is usually limited, whereas we shall show by

simulations that the gain in power can be substantial because we combine independent sources

of information. Second, the Mantel-Haenzel test, sum test and Fisher’s inverse test considered in

this section are designed for combining asymptotic independent test statistics. The null distri-

bution will be much more complex when the statistics are correlated. In fact, the maximum test

requires extensive simulations to compute the null distribution (Kosorok and Lin, 1999). On the

other hand, our proposed Fisher’s inverse test can be readily extended to combine two maximum

tests, one for the backward time and the other from follow-up data, since information from two

sources are asymptotically independent.


4. Simulation studies

We performed simulation studies to evaluate the performance of the proposed test statistics. In

each case, independent data sets were generated 10000 times under the null hypothesis and 1000

times under the alternative hypotheses. First, we considered a two-sample testing scenario where

X was a Bernoulli variable with p = 0.5. The survival time Y was generated by model (1.1) where

β = 0, 0.1, 0.2, 0.3, 0.4 and ε = log(ε)/4, where ε was standard exponential distributed. That is, ε

is extreme value distributed. We also considered ε to be a logistic or a normal distribution with

the same mean and variance as the above extreme value distribution to show that the tests are

robust against misspecification of parametric error distribution. To obtain a length-based sample,

we generated random truncation times A0 from a U(0, 30) distribution; an observation was in

the cross sectional sample if −A0 + Y > 0. Data were generated until the cross-sectional sample

had n = 100 or n = 200 observations. The survival endpoint was censored if an individual in

the prevalent cohort survived past 0.5 time units after recruitment. We compared the proposed

testing procedures, including the test based only on backward recurrence time without follow-up

and the four combined test statistics, to two existing methods: the linear rank statistics of Ying

(1990) based on left truncated right censored data (LTRC) and the modified log-rank statistics

of Ning et al. (2010), denoted by NSQ. A 5% significance level was chosen. Table 1 summarizes

the results from the simulation study. The results showed that all tests had an empirical Type

I error that was close to the nominal value, even for a small sample size and misspecified error

distributions. Under alternative hypotheses, there were significant improvements in power by

incorporating information from backward recurrence times. In fact, the test statistics based on

backward recurrence times alone had a higher power than the conventional log-rank test based

on left truncated and right censored data. All combined tests had improved power compared

to the conventional log-rank test or the test based only on the backward recurrence time. The

combined tests had a consistent improved power compared to the test of Ning et al. (2010)


which also recognized the length-bias data structure but utilized information in a different way.

Their test requires the nonparametric estimation of a pooled survival distribution which may

compromise power under alternative hypotheses, while the proposed tests do not require such

an estimation. Among the proposed tests, the Mantel-Haenszel test and its modification had a

consistently higher power than the sum of Chi-square and the Fisher’s test, and the modified

Mantel-Haenszel test performed slightly better than the Mantel-Haenszel test.

Next we considered a scenario where X was bivariate and contained a continuous variable.

Suppose X = (XT1 , X

T2 )T , where X1 is a Bernoulli variable with p = 0.5, and X2 is standard uni-

form distributed. The modified log-rank test of Ning et al. (2010) is only applicable to univariate

categorical X and was not considered under this scenario. We considered alternative hypotheses

that do not follow model (1.1) and also two scenarios of non-stationary incident distributions.

For stationary incident distribution, we generated random truncation times A0 from a U(0, 30)

distribution; an observation was in the cross sectional sample if −A0 + Y > 0. For covariate-

independent non-stationary disease incidence, A0 is generated from 30×Beta(1, 3) which gives

an increasing disease incidence over calendar time. For covariate-independent non-stationary dis-

ease incidence, A0 is generated from 30×Beta(1, 3 exp(−X1 − X2)). Survival time Y followed

FY (y|x) = {5/(y+5)}{1+10 exp(xT β)} with β = (0, 0), (0.5, 0), (0, 0.5), (0.5, 0.5). Note that the sur-

vival time did not follow an accelerated failure time model. The sampling of cross-sectional data

and the censoring mechanism were chosen to be the same as in the previous example. Results are

being shown in Table 2. The major conclusions were similar as in the previous example, and the

proposed methods greatly improved power compared to the existing log-rank test, even when the

alternative distributions do not follow an accelerated failure time model. The Mantel-Haenszel

test and its modification continued to outperform other estimators, and the difference in power

was greater in this scenario. The simulations also showed that the type I error remains close to

the nominal value under mild departure of the stationary disease incidence assumptions.


5. Application to a health survey without follow-up

Partial survival information in the form of backward recurrence times are frequently collected

in health surveys. Most surveys are administered cross-sectionally without follow-up data, due

to cost and logistic reasons. Without any prospective follow-up, survival times being collected

are all censored and conventional statistical methods are not applicable. Therefore, the partial

survival information being collected are seldom analyzed. An exception in given in McLaughlin

et al. (2010), who examined the associations between childhood adversities and the durations of

adult mental disorders using backward recurrence times collected from a nationally representative

sample from the National Comorbidity Survey Replication.

For illustrating the proposed test statistic, we analyzed a different survival outcome collected

in the same survey, the duration between suicidal thoughts. Although suicidal thoughts are re-

current events, cross-sectional surveys like the one we have usually collect the most recent onset

of the recurrent events. Therefore we only have time from the last event which is exactly the

model set-up in Section 2. The survey was administered in 2001-2002, and collected the time of

last suicidal thoughts from 1010 respondents with ages between 18 and 91. We examined whether

certain experiences from childhood are associated with the duration between suicidal thoughts

among adults. We first examined whether the duration between suicidal thoughts were associ-

ated with living with both biological parents until age 16. The proposed log-rank statistic was

9.40 with a χ21 null distribution, which gives a p-value of 0.002, whereas the proposed Wilcoxon

statistic was 13.10 with a p-value of 0.0003. As mentioned in Section 2, the log-rank statistic is

conservative under departure of proportional hazards alternative, but the result is consistent with

the Wilcoxon statistic in this application. A direct comparison of backward recurrence times also

revealed a consistent pattern, in which the mean backward recurrence time among individuals

who lived with both parents until 16 was 2.6 years (95% CI: 1.0-4.1) longer than individuals who

did not live with both parents. We also examined the associations between the duration between


suicidal thoughts with three childhood adversities: parental substance abuse, parental criminality

and family violence. The proposed log-rank statistics were 5.0, 12.1 and 0.6 for parental substance

abuse, parental criminality and family violence respectively, yielding p-values of 0.025, 0.001 and

0.431. Therefore, the data suggested that parental substance abuse and criminality were associ-

ated with the duration between suicidal thoughts at a 5% significance level. However, the data did

not suggest any association between family violence and the duration between suicidal thoughts.

We reiterate that the tests of Ying (1990) and Ning et al. (2010) are not applicable to this data

because there was no prospective follow-up.

6. Application to a prospective study of Alzheimer’s disease

Next, we applied the proposed tests to the Canadian Study of Health and Aging (Wolfson et al.,

2001). The same example has been used to study the performance of the modified log-rank test

in Ning et al. (2010). A prevalent sample of individuals suffered from dementia was collected in

1991. The date of dementia onset was collected from medical history, and a prospective follow-

up was conducted between 1991 and 1996. It is well established that the sample is subject to

length-biased sampling, see for example, Shen et al. (2009) and Ning et al. (2010) among others.

Subjects were being classified into one of the three diagnostic categories: probable Alzheimer’s

disease, possible Alzheimer’s disease and vascular dementia. We investigated whether the time

from onset to death were associated with diagnostic categories. The available data included 818

subjects with dementia, among them 393 had probable Alzheimer’s disease, 252 had possible

Alzheimer’s disease and 173 had vascular dementia. We considered pairwise comparisons and an

overall comparison among the three groups. The P-values from the different tests are summarized

in Table 3. As noted in Ning et al. (2010), the conventional log-rank test for left truncated right

censored data did not reject the null hypotheses that the survival distributions for any two of the

three groups were equal. However, statistical significant differences were found between vascular


and possible Alzheimer’s disease, between probable and possible Alzheimer’s disease and in the

three-sample test at a 5% significant level using the proposed Mantel-Haenszel tests. P-values

were slightly higher for the sum test and the Fisher’s test. For the three-sample test, the test of

Ning et al. (2010) obtained a marginal significant result at a 5% level of significance, whereas

the proposed Mantel-Haenszel tests had a lower P-value of 0.02. Interesting, the same conclusion

can be reached at the baseline recruitment before any follow-up began, by using the log-rank test

based on backward recurrence times.

To further understand the power of detecting a difference in a three-sample test, we ran a sim-

ulation study with estimated parameter values from the data. We fit a parametric survival model

(1.1) with an extreme-value distributed error. We estimated that (β1, β2) = (0.18,−0.05) where

β1 is the relative survival time comparing subjects with possible Alzheimer’s disease with sub-

jects with probable Alzheimer’s disease, and β2 is the relative survival time comparing subjects

with vascular dementia with subjects with probable Alzheimer’s disease. The error distribution

is ε = log(ε) where ε is an exponential distribution with an estimated mean of 3.82 years. The

result is comparable to a semiparametric analysis based on log-rank estimating equations which

estimates (β1, β2) = (0.19,−0.07). It is estimated that 47%, 35% and 18% of cases has probable

Alzheimer’s disease, possible Alzheimer’s disease and vascular dementia respectively, using the

bias corrected estimator of Chan and Wang (2012). We simulated 1000 independent data sets

using these parameters and each data set contained 818 subjects in a cross-sectional sample. The

generation of cross-sectional sample was described in Section 4. The residual censoring time from

recruitment was generated from a uniform (4.3, 5.8) distribution, consistent with the observed

data. We found that the power of the sum of Chi-square test and Fisher’s test were 76%, whereas

the power of the Mantel-Haenszel test and the modified Mantel-Haenszel test were 86%. There-

fore, the Mantel-Haenszel tests have a lower Type II error and are more likely to reject the null

hypothesis when then the alternative is true.


7. Concluding remarks

A linear rank statistic is proposed for length-biased cross-sectional survival data without follow-

up, based on backward recurrence time. Existing test statistics cannot be applied to this case be-

cause all subjects are essentially censored. Interestingly, the special cases of log-rank and Wilcoxon

statistics treat backward recurrence time as if it is the completely observed survival time. When

prospective follow-up is present, the proposed rank statistic without follow-up can be combined

with a conventional rank statistic for cross-sectional data with follow-up. Another interesting

observation is that the two rank statistics based on the same sample are asymptotically indepen-

dent. This property facilitates the combination of two statistics and four methods were explored.

Also, combing two independent statistics can substantially improve power. Based on the simula-

tion studies and the data analysis, the proposed Mantel-Haenszel tests have a larger power and

we recommend their use in practice.

The test of Ning et al. (2010) was designed for cross-sectional data with prospective follow-up.

The test is not asymptotically equivalent to our proposed method. In a cross-sectional sample

without prospective follow-up, the test of Ning et al. (2010) has infinite variability and the

proposed method still work. When forward follow-up is much shorter compared to backward time,

which happens in practice because follow-up data is typically costly to collect, the proposed test

is expected to outperform its existing competitors. Our test statistics are also applicable to the

case where follow-up time is only observed from a subsample. Existing methods would generally

discard backward information for the incomplete observations without forward follow-up, while

the proposed method can utilize backward information from the full sample.

Acknowledgments

The authors thank Prof. Anastasios Tsiatis, an associate editor and two reviewers for their help-

ful comments and suggestions which greatly improved this paper. The authors thank Professor

18 REFERENCES

Masoud Asgharian for sharing the Canadian Study of Health and Aging data. The first author

is partially supported by grant R01 HL-122212 from the National Institutes of Health. Conflict

of Interest: None declared.

References

Allison, P. D. (1985). Survival analysis of backward recurrence times. Journal of the American

Statistical Association 80(390), 315–322.

Bhattacharya, N. (1961). Sampling experiments on the combination of independent χ2 tests.

Sankhya 11, 191–196.

Chan, K. C. G., Chen, Y.Q. and Di, C.-Z. (2012). Proportional mean residual life model for

right-censored length-biased data. Biometrika 99(4), 995–1000.

Chan, K. C. G. and Wang, M.-C. (2012). Estimating incident population distribution from

prevalent data. Biometrics 68(2), 521–531.

Cox, D. R. (1962). Renewal theory . London: Methuen.

Fisher, R. A. (1932). Statistical methods for research workers. London: Oliver and Boyd.

Hajek, J., Sidak, Z. and Sen, P. K. (1967). Theory of rank tests. New York: Academic press.

Harrington, D. P. and Fleming, T. R. (1982). A class of rank test procedures for censored

survival data. Biometrika 69(3), 553–566.

Hess, K. R. (1995). Graphical methods for assessing violations of the proportional hazards

assumption in Cox regression. Statistics in Medicine 14(15), 1707–1723.

Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on cox’s regression

and life model. Biometrika 60(2), 267–278.

REFERENCES 19

Kalbfleisch, J. D. and Prentice, R. L. (2002). The statistical analysis of failure time data.

John Wiley & Sons.

Keiding, N., Fine, J. P., Hansen, O. H. and Slama, R. (2011). Accelerated failure time

regression for backward recurrence times and current durations. Statistics & Probability Let-

ters 81(7), 724–729.

Kosorok, M. R. and Lin, C.-Y. (1978). The versatility of function-indexed weighted log-rank

statistics. Journal of the American Statistical Association 94(445), 320-332.

Louv, W. C. and Littell, R. C. (1986). Combining one-sided binomial tests. Journal of the

American Statistical Association 81(394), 550–554.

Mandel, M. and Ritov, Y. (2010). The accelerated failure time model under biased sampling.

Biometrics 66(4), 1306–1308.

Mantel, N. and Haenszel, W. (1959). Statical aspects of the analysis of data from retro-

spective studies of disease. Journal of the National cancer Institute 22(4), 719–748.

Marden, J. I. (1982). Combining independent noncentral chi squared or f tests. The Annals of

Statistics 10(1), 266–277.

Martinez, R. L. M. C. and Naranjo, J. D. (2010). A pretest for choosing between logrank

and wilcoxon tests in the two-sample problem. Metron 68(2), 111–125.

McLaughlin, K. A, Green, J. G., Gruber, M. J., Sampson, N. A., Zaslavsky, A. M.

and Kessler, R. C. (2010). Childhood adversities and adult psychiatric disorders in the

national comorbidity survey replication ii: associations with persistence of dsm-iv disorders.

Archives of general psychiatry 67(2), 124–132.

Ning, J., Qin, J. and Shen, Y. (2010). Non-parametric tests for right-censored data with biased

20 REFERENCES

sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72(5),

609–630.

Ning, J., Qin, J. and Shen, Y. (2011). Buckley–James-Type Estimator with Right-Censored

and Length-Biased Data. Biometrics 67(4), 1369–1378.

Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika 65(1), 167–179.

Shen, Y., Ning, J. and Qin, J. (2009). Analyzing length-biased data with semiparametric

transformation and accelerated failure time models. Journal of the American Statistical Asso-

ciation 104(487), 1192–1202.

Tsai, W.-Y., Jewell, N. P. and Wang, M.-C. (1987). A note on the product-limit estimator

under right censoring and left truncation. Biometrika 74(4), 883–886.

Vardi, Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing

density: nonparametric estimation. Biometrika 76(4), 751–761.

Wang, M.-C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of

the American Statistical Association 86, 130–143.

Wolfson, C., Wolfson, D. B., Asgharian, M., M’Lan, C. E., Østbye, T., Rockwood,

K. and Hogan, D. B. (2001). A reevaluation of the duration of survival after the onset of

dementia. New England Journal of Medicine 344(15), 1111–1116.

Yamaguchi, K. (2003). Accelerated failure–time mover–stayer regression models for the analysis

of last–episode data. Sociological Methodology 33(1), 81–110.

Ying, Z. (1990). Linear rank statistics for truncated data. Biometrika 77(4), 909–914.

REFERENCES 21

Table 1. Percentage of hypotheses rejected by the existing and proposed tests: Two-sample testing. LTRC:conventional log-rank statistic for left truncated right censored data, NQS: modified log-rank statisticof Ning et al. (2010), Backward: proposed log-rank statistic based on backward recurrence time, MH:proposed Mantel-Haenszel statistic, MMH: proposed modified Mantel-Haenszel statistic, SUM: proposedsum of Chi-square statistic, Fisher: proposed Fisher’s inverse Chi-square statistic.

(a) Extreme value distributionn=100 n=200

Estimator\β 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4LTRC 5 11 29 55 71 5 21 61 85 96NQS 5 16 53 86 98 5 30 86 100 100

Backward 5 15 42 73 94 5 26 72 96 100MH 5 21 59 89 98 6 39 90 100 100

MMH 5 23 62 91 99 6 41 93 100 100SUM 5 18 55 86 97 5 32 88 100 100Fisher 5 18 54 87 98 5 32 88 100 100

(b) Logistic distributionn=100 n=200


Backward 5 11 33 58 80 5 20 58 86 99MH 5 19 48 82 96 6 30 81 98 100

MMH 5 15 53 87 98 6 33 85 99 100SUM 5 15 44 80 95 5 25 78 97 100Fisher 5 15 44 79 96 5 25 79 97 100

(c) Normal distributionn=100 n=200


Backward 6 13 33 60 81 5 21 60 91 99MH 5 18 50 84 96 6 33 84 99 100

MMH 5 18 54 87 98 6 34 87 99 100SUM 5 14 45 80 95 5 27 80 98 100Fisher 5 15 46 80 95 5 27 80 99 100

22 REFERENCES

Table 2. Percentage of hypotheses rejected by the existing and proposed tests: Bivariate covariates anda misspecified alternative model. Test procedures are the same as in Table 1.

(a) Stationary incidencen=100 n=200

Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 4 17 8 18 4 38 18 43

Backward 5 45 19 61 5 82 39 92MH 5 64 25 76 5 93 54 97

MMH 5 62 23 70 5 91 53 95SUM 4 45 18 61 4 86 41 93Fisher 4 45 18 61 4 86 41 93

(b) Non-stationary, covariate-independent incidencen=100 n=200

Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 4 21 10 20 4 41 16 46

Backward 5 42 17 53 5 79 33 89MH 4 64 25 71 5 93 47 97

MMH 4 61 23 66 4 92 46 96SUM 4 48 16 54 5 85 36 92Fisher 4 48 16 54 5 85 36 92

(c) Non-stationary, covariate-dependent incidencen=100 n=200

Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 5 23 11 26 5 48 21 50

Backward 5 59 23 74 5 90 44 98MH 6 72 32 85 6 96 56 100

MMH 5 69 29 81 5 96 54 100SUM 5 60 23 77 5 94 48 99Fisher 5 60 23 77 5 94 48 99

Table 3. P-values multiplied by 100, for the Canadian Health and Aging data. Test procedures are thesame as in Table 1. (a) Vascular vs probable; (b) Vascular vs possible; (c) Probable vs possible; (d)Three-sample test.

(a) (b) (c) (d)LTRC 42 33 71 54NQS 35 2 4 5

Backward 45 1 2 1MH 28 1 4 2

MMH 28 2 5 2SUM 56 3 5 5Fisher 52 1 6 5

Documents

Rank-based testing of equal survivorship based on cross …faculty.washington.edu/kcgchan/papers/combined_logrank.pdf · floga (i)g n i=1 x. (2.3) The expectation is taken with respect