Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Biostatistics (2014), 0, 0, pp. 1–22doi:10.1093/biostatistics/combined˙logrank
Rank-based testing of equal survivorship based
on cross-sectional survival data with or without
prospective follow-up
KWUN CHUEN GARY CHAN∗
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
JING QIN
Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda,
MD 20892, USA
Summary
Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-
up since all subjects are essentially censored. However, partial survival information are avail-
able from backward recurrence times, and are frequently collected from health surveys without
prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed
based only on backward recurrence times without any prospective follow-up. When follow-up data
are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-
up information from the same sample are shown to be asymptotically independent. We discuss
four ways to combine these two statistics when follow-up is present. Simulations show that all
∗To whom correspondence should be addressed.
c© The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]
2 K. C. G. Chan and J. Qin
combined statistics have substantially improved power compared to conventional rank statistics,
and a Mantel-Haenszel test performed the best among the proposal statistics. The method is
applied to a cross-sectional health survey without follow-up and a study of Alzheimer’s disease
with prospective follow-up.
Key words: Accelerated failure time model; Backward recurrence time; Length biased sampling.
1. Introduction
Cross-sectional sampling of survival data recruits individuals who have experienced a certain
initial event but not a failure event at the sampling time. It is considered a focused and economical
design for studying the natural history of disease (Wang, 1991) but is subject to selection bias.
In particular, subjects with a longer survival time are more likely to be sampled. Cross-sectional
data can be collected without prospective follow-up, in this case only a backward recurrence
time is collected as discussed in Cox (1962), Allison (1985) and Yamaguchi (2003) among others.
The estimation of the backward time distribution is one of the key elements in evaluating the
statistical accuracy of estimates of current HIV incidence rates from cross-sectional surveys.
Backward recurrence time is also frequently collected in health surveys.
In cross-sectional data with prospective follow-up in addition to the backward time, the data
structure allows one to apply statistical methods designed for left truncated and right censored
data. When disease incidence is stationary over time, cross-sectional survival data are length-
biased (Wang, 1991). For length-biased survival data, it has been shown that conventional meth-
ods for analyzing left truncated right censored data are inefficient and more efficient estimators
were proposed by Vardi (1989), Shen et al (2009), Ning et al (2011) and Chan et al. (2012) among
others.
In conventional survival analysis with right censored data, Prentice (1978) introduced a gen-
Testing of cross-sectional survival data 3
eral class of linear rank tests for a semiparametric accelerated failure time model. Unlike the fully
parametric approach which depends on the correctness of the underlying distributional assump-
tion, the rank test is robust to distributional misspecification. If the parametric assumption is
correct, then the rank test is fully efficient. On the other hand, the rank test can produce a valid
test even when the distributional assumption is misspecified. Ying (1990) extended linear rank
statistics to left truncated and right censored data which can be applied to cross-sectional survival
data. When the censoring proportion is high, the existing tests typically have low power. Recently,
Ning et al. (2010) developed a modified log-rank test for k-sample testing based on cross-sectional
survival data with prospective follow-up. However, the tests developed by Ying (1990) and Ning
et al. (2010) cannot be applied to cross-sectional survival data without prospective follow-up.
In this paper, we extend the work of Prentice (1978) and Ying (1990) to length-biased cross-
sectional data without follow-up based on backward recurrence time only, which can be optimally
combined with Ying’s rank test when prospective follow-up is present. As in Prentice (1978) and
Ying (1990), our primary focus is to test the null hypothesis H0 : β = 0 against two-sided
alternatives under a population accelerated failure time model
log Y = XTβ + ε (1.1)
where Y is the survival time of interest, X is a p-vector covariate, β is a p-vector coefficient and
ε follows an unspecified distribution with a density function fε and a survival function Fε. It is
assumed that X and ε are independent. The proposed statistics can be applied to cross-sectional
survival data without follow-up, whereas the log-rank statistic of Ning et al. (2010) cannot be
applied to that case. Moreover, the sampling distribution of their statistic is found by resampling
methods when censoring is present. In contrast, the asymptotic variance of the proposed test
statistic has a closed form consistent estimator and can be computed from existing software.
This will facilitate practical implementations of the proposed methods.
The rest of the paper is organized as follows. The proposed linear rank statistics for cross-
4 K. C. G. Chan and J. Qin
sectional data without follow-up are given in Section 2. When follow-up data are also available in
a cross-sectional sample, we can combine the linear rank statistics proposed in Section 2 and the
linear rank statistics based on follow-up data to improve power. Four ways to combine the two
statistics will be discussed in Section 3. Results from simulation studies are presented in Section
4; Section 5 contains an analysis of the National Comorbidity Survey Replication; Section 6
contains an analysis of the Canadian study of Health and Aging, and concluding remarks are
given in Section 7.
2. Linear rank statistics for cross-sectional data without follow-up
In a cross-sectional sample with complete follow-up, we could observe Y = A+V where A is the
time from an initial event to recruitment, also known as the backward recurrence time; and V
is the time from recruitment to a failure event, also known as the forward recurrence time. In a
cross-sectional sample without follow-up, only A is observed but not V or Y . In other words, every
individual is right censored. Under length-biased sampling, the backward time has a conditional
density (Cox, 1962)
fA(a|x) =FY (a|x)
µY (x), a > 0, µY (x) =
∫ ∞0
FY (a|x)da = µ exp(xTβ)
where FY (y|x) is the conditional survival function of Y given X = x, µY (x) = E(Y |X = x) and
µ =
∫ ∞−∞
Fε(a) exp(a) da =
∫ ∞0
fε(log a)da . (2.2)
The last equality in (2.2) is shown in Section A of the supplementary material. The density of
A given above is a consequence of length-biased sampling (Cox, 1962). In the context of cross-
sectional survival data, length-biasness of survival time requires a stationary disease incidence
assumption where the disease incidence in the population remains constant over time and inde-
pendent of (X,Y ) (Wang, 1991). This assumption is imposed for the estimation of β for model
(1.1) in Shen et al. (2009), Ning et al. (2011) among others and the test of Ning et al. (2010). We
Testing of cross-sectional survival data 5
derive the test statistic under this working assumption, since the derivation is tractable. How-
ever, we show in Section D of the supplementary material that the test is valid even for disease
incidence being non-stationary over time. Simulations are reported in Section 4 which indicates
that the test can be valid under misspecification of the stationary incidence assumption.
When the distribution of ε is known, parametric tests can be readily constructed for testing
H0 : β = 0, but their validity depend on the correct model specification. Alternative to parametric
tests, the linear rank test was proposed to offer a greater robustness to model misspecification.
A comprehensive review of linear rank statistics for right censored data can be found in Chapter
7 of Kalbfleisch and Prentice (2002). In the following we derive a linear rank test based on
backward recurrence time A. Let a(1) < . . . , a(n) be the ordered outcomes with corresponding
covariate vectors x(1), . . . , x(n), respectively. The rank vector is given by the corresponding labels
r = [(1), (2), . . . , (n)] and the rank likelihood (Kalbfleisch and Prentice, 1973) based on (ai, xi), i =
1, . . . , n is
LA = pr(a(1) < ... < a(n)|x1, . . . , xn) =
∫a(1)<...<a(n)
n∏i=1
FY {a(i)|x(i)}µY {x(i)}
da(i) ,
where a(1) < . . . < a(n) are ordered values of ai and x(i) are the corresponding covariate values.
Under model (1.1), one can write out the log-rank likelihood as
logLA = log
[∫a(1)<...<a(n)
n∏i=1
Fε{log a(i) − xT(i)β}da(1) . . . da(n)
]−
n∑i=1
xTi β − n logµ,
A score test statistic can be constructed as
∂ logLA∂β
∣∣∣∣β=0
=
n∑i=1
x(i)E
[fε{log a(i)}Fε{log a(i)}
]−
n∑i=1
xi. (2.3)
The expectation is taken with respect to the i-th order statistic generated from a distribution
with density Fε(log a)/µ. The derivation of (2.3) is shown in Section B of the supplementary
Material. Furthermore, since
n∑i=1
E
[fε(log a(i))
Fε(log a(i))
]=
n∑i=1
E
[fε(log ai)
Fε(log ai)
]= n
∫fε(log a)da/µ = n ,
6 K. C. G. Chan and J. Qin
the linear rank statistic can be written as T1 =∑ni=1 x(i)ci, where ci = E(φ(u(i))), φ(u) =
fε{log S−1(1−u)}[Fε{log S−1(1−u)}]−1−1 and S(a) =∫∞aFε(log(u)) du/µ, and u(1) < . . . < u(n)
are ordered statistics of uniform distributed random variables. Note that φ(u) is quite different
from φ(u) which is given in Chapter 7 of Kalbfleisch and Prentice (2002) for unbiased samples.
We consider two important special cases where ci and the associated rank tests have closed
form expressions, which lead to easier computations and well known test statistics. First, when
the error term ε is extreme-value distributed, it can be shown that S is the survival function of
an exponential distribution. Therefore, it follows from the same calculation in Prentice (1978)
that ci = n−1 + (n − 1)−1 + . . . + (n − i + 1)−1 − 1. Thus, it follows that TLR,1 =∑ni=1 x(i)ci
which is exactly the log-rank statistics treating backward recurrence times (a1, . . . , an) as if they
are completely observed survival times in an unbiased sample.
Another important special case is to construct a Wilcoxon statistics based on backward re-
currence times. Suppose the error distribution follows the G-rho family with ρ = 1/2 (Harrington
and Fleming, 1982), which has a survival function
Fε(ε) =1
(1 + eε/2)2. (2.4)
It follows that fε(ε) = eε/(1 + eε/2)3 and S(a) = (1 + a/2)−1. Therefore, φ = 2u − 1 and
ci = 2i(n + 1)−1 − 1. The corresponding statistic is TWil,1 =∑ni=1 x(i) (2i/(n+ 1)− 1) which
is exactly the Wilcoxon statistics treating backward recurrence times (a1, . . . , an) as if they are
completely observed survival times in an unbiased sample. In contrast to the construction of a
Wilcoxon statistic for an unbiased sample, which corresponds to a logistic error distribution, the
construction here requires the error distribution to have a lighter tail. It is because the distribu-
tion backward recurrence time, called the stationary distribution in renewal theory (Cox, 1962),
is generally having a heavier tail than the survival distribution. One exception is an exponential
survival distribution which has an exponential stationary distribution. This corresponds to an
extreme-valued error distribution, where the log-rank test can be constructed based on unbi-
Testing of cross-sectional survival data 7
ased survival data and backward recurrence times. Note that for a logistic distributed error, the
stationary distribution is undefined because µ is infinite.
The test statistics are asymptotically normal following from the standard theory of linear rank
estimation, provided that∫ 1
0φ2(u)du <∞ (Hajek et al., 1967; Prentice, 1978). Following their ar-
guments, a consistent variance estimate of TLR,1 is VLR,1 =∑ni=1{X(i)−(n−i+1)−1
∑nj=iX(j)}⊗2
and a consistent variance estimate of TWil,1 is VWil,1 =∑ni=1[i{(n + 1)}−1]2{X(i) − (n − i +
1)−1∑nj=iX(j)}⊗2, where a⊗2 = aaT for a column vector a. Therefore, Chi-square test statis-
tics can be constructed by χ2LR,1 = TTLR,1V
−1LR,1TLR,1 and χ2
Wil,1 = TTWil,1V−1Wil,1TWil,1 which are
asymptotically χ2(p) distributed under the null hypothesis.
In the literature, there are a few prominent ways for choosing between the log-rank and
Wilcoxon tests. One way is to specify the test statistic before analysis based on prior belief or
hypothesis about the possible alternative hypothesis. When proportional hazard alternative is
assumed, the log-rank test is optimal. When it is assumed that the hazard ratio decreases with
time, Wilcoxon test is usually more powerful. Another way is to use graphical diagnostics for
choosing the test statistics (Hess, 1995), and a related pre-test is recently proposed by Martinez
(2010). Note that the choice between log-rank and Wilcoxon tests only affects power, and will
only be useful for borderline cases. An advantage for rank-based statistics is their robustness
against misspecification of parametric error distribution. In fact, log-rank statistic is often used
in practice despite possible departure from proportional hazards alternative, since it provides a
conservative test in such cases.
In the above discussion, we constructed rank-based statistics based on backward recurrence
times (a1, . . . , an) observed in a cross-sectional sample as if they are completely observed survival
times in an unbiased sample. The result is seemingly paradoxical, and can be better understood
from the following. Cross-sectional sampling preferentially select longer survivors. For subgroups
with better survival, it is more likely to recruit individuals who lived longer since disease incidence.
8 K. C. G. Chan and J. Qin
Therefore, the backward time A also tends to be longer in those subgroups. It was discussed in
Allison (1985) that one can fit a proportional hazards model based on backward recurrence times
to estimate the relative hazard in the population survival model under a strong assumption
of exponential distributed failure time. The log-rank statistic is the corresponding score statistic
based on a partial likelihood function. For parametric accelerated failure time models, Yamaguchi
(2003) and Keiding et al. (2011) also showed that survival model parameters can be estimated
based on backward recurrence time. In particular, they showed that logA = XTβ + ε∗ for the
same regression coefficient β but a different error term ε∗ which has the survival function Fε∗(ε) =∫∞exp(ε)
Fε(log(u))du/µ. This suggests that testing the null hypothesis β = 0 from the accelerated
failure time model using backward recurrences times is equivalent to testing the null hypothesis
β = 0 in the target population. Therefore, linear rank statistics can also be constructed directly
backward recurrence times and the corresponding statistics coincide with our results.
3. Combined rank statistics for cross-sectional data with follow-up
When prospective follow-up is present, a right-censored version of the forward recurrence time V
would be observed in addition to the backward recurrence time A. Denote the observed length
biased sampling data with possible right censoring as {zi = min(ti, ci) = min(ai+vi, ai+ri), δi =
I(vi 6 ri), xi, i = 1, 2, ..., n}. Suppose V and R are conditionally independent given A and X
and the censoring distribution does not involve any parameter of the survival distribution. Note
that in length-biased sampling, A and Y are independent in the population. The observed data
likelihood is
L = LC × LM =
n∏i=1
{fY (zi|xi)FY (ai|xi)
}δi { FY (zi|xi)FY (ai|xi)
}1−δi×
n∏i=1
{FY (ai|xi)µY (xi)
}.
Based on the conditional likelihood function LC , linear rank statistics have been proposed
by Ying (1990). Based on the marginal likelihood function LM , linear rank statistics have been
proposed in the previous section. We will discuss how to combine the two statistics in this section.
Testing of cross-sectional survival data 9
Note that the backward recurrence time A and forward recurrence time V are correlated (Cox,
1962). Also, in the context of cross-sectional survival data, informative censoring occurs because
the survival time T = A + V and the censoring time C = A + R within a prevalent population
shares a common random component A. However, T and C are independent conditional on (A,X),
since R and V are independent conditional on (A,X). Because of this conditional independence
and the definition of risk sets for left truncated right censored data, Ying’s test does not suffer from
induced informative censoring. However, efficiency is lost because of conditioning. We improve
efficiencies by utilizing the information from the marginal distribution of A.
We introduce the following counting process notations for representing Ying’s statistics:
Ni(t) = I(zi 6 t, δi = 1), Ri(t) = I(zi > t > ai). When the error is extreme-value dis-
tributed, the locally most powerful test based on LC is the log-rank test TLR,2 =∑ni=1
∫ τ0{xi −
x(t)}dNi(t) , where x(t) = {∑ni=1 xiRi(t)}/{
∑ni=1Ri(t)} and τ is a constant corresponds to
the maximal observable survival time. When the distribution function of the error term is
(2.4), the locally most powerful test (Harrington and Fleming, 1982) is the G-rho test with
ρ = 1/2, TG−rho,2 =∑ni=1
∫ τ0
[S(t−)]1/2{xi − x(t)}dNi(t) where S(t−) is the left hand limit of
the product-limit estimator for left truncated right censored data (Tsai et al, 1987). The two
statistics have consistent variance estimates VLR,2 =∑ni=1
∫ τ0{xi− x(t)}⊗2dNi(t) and VG−rho,2 =∑n
i=1
∫ τ0S(t−){xi − x(t)}⊗2dNi(t) respectively. Chi-square test statistics can be constructed by
χ2LR,2 = TTLR,2VLR,2
−1TLR,2 and χ2G−rho,2 = TTG−rho,2VG−rho,2
−1TG−rho,2 which are asymptotically
χ2(p) distributed under the null hypothesis.
Since two Chi-square test statistics are constructed based on LM and LC respectively, it is
intuitive that a combined test statistic shall lead to an improvement of power if the two statis-
tics are not perfectly correlated. In fact, the two statistics are asymptotically independent as
we shall show. Power can be greatly improved as if we had two independent samples instead of
one. In the following we focus on combining two log-rank tests TLR,1 and TLR,2 under a work-
10 K. C. G. Chan and J. Qin
ing assumption that the error is extreme-value distributed. The combination of test statistics
TWil,1 and TG−rho,2 under a working error distribution (2.4) will follow a similar manner. We
first need to find out the correlations between TLR,1 and TLR,2 under the null hypothesis. Note
that TLR,1 is a function of only (ai, xi), i = 1, . . . , n, and E(TLR,2|a1, . . . , an, x1, . . . , xn) = 0
which will be shown in Section C of the supplementary material. Therefore, E(TLR,1TLR,2) =
E{TLR,1E(TLR,2|a1, . . . , an, x1, . . . , xn)} = 0, so that TLR,1 and TLR,2 are uncorrelated. More-
over, the two statistics have limiting Gaussian distributions and are asymptotically independent.
Following this result, we investigate several methods for combining the two log-rank statistics.
The first method is inspired by the method combining independent 2 × 2 tables in Mantel
and Haenszel (1959). Under H0, TLR,1 + TLR,2 is asymptotically normal with mean zero and
variance VLR,1 + VLR,2, and a Mantel-Haenszel (MH) statistic is defined as χ2LR,MH = (TLR,1 +
TLR,2)T (VLR,1 + VLR,1)−1(TLR,1 + TLR,2), which is asymptotically χ2(p) distributed.
The second method is a modification of χ2LR,MH. The covariance matrices VLR,1 and VLR,2
can be written as BLR,1BTLR,1 and BLR,2B
TLR,2 respectively, and B−1LR,1TLR,1 and B−1LR,2TLR,2 are
asymptotically normal with mean zero and variance Ip×p, the identity matrix. A modified Mantel-
Haenszel (MMH) statistic is defined as χ2LR,MMH = (B−1LR,1TLR,1 + B−1LR,2TLR,2)T (B−1LR,1TLR,1 +
B−1LR,2TLR,2), which is asymptotically χ2(p) distributed.
The third method is a sum of Chi-square test (Bhattacharya, 1961), defined as χ2SUM =
χ2LR,1 + χ2
LR,2, which is asymptotically χ2(2p) distributed.
The fourth method is the Fisher’s inverse Chi-square test statistics (Fisher, 1932, pp.99-101),
defined as χ2Fisher = −2 log pLR,1− 2 log pLR,2, where pLR,1 and pLR,2 are P-values based on χ2
LR,1
and χ2LR,2 respectively. The Fisher’s statistics is asymptotically χ2(4) distributed, regardless of
the dimensions of X. It is because the P-values are uniformly distributed under the null hypothesis
and each of the two terms in χ2Fisher is χ2(2) distributed.
There had been other procedures proposed for combining independent Chi-square tests, see
Testing of cross-sectional survival data 11
for example Marden (1982). We considered the above four procedures for the following reasons.
The Mantel-Haenszel test has been shown to outperform other procedures for combining 2 × 2
tables with similar departures from the null hypothesis for all tables (Louv and Littell, 1986).
In our case, the association parameter β is the same for the tests based on backward recurrence
time and follow-up time, suggesting that Mantel-Haenszel type procedures should have a decent
power. On the other hand, the sum of Chi-square and Fisher’s tests are shown to be admissible in
Marden (1982) and have reasonable power against a wide range of alternatives (Louv and Littell,
1986).
It is well known that the log-rank statistic has optimal power under proportional hazards al-
ternatives. For decreasing hazard ratios, the Wilcoxon test or the G-rho family is more powerful.
For right-censored data, Kosorok and Lin (1999) derived a versatile maximum test of weighted
log-rank statistics, which is powerful against a wide range of alternatives. We would like to stress
that our main contribution is rather different than the broad class of tests proposed for right-
censored data. We highlight the differences as follows. First, we combine asymptotic independent
test statistics from two sources (A and Y ) using the same set of data. This is different from
maximum tests that combine dependent test statistics from one source (Y only). For a combina-
tion of dependent test statistics, the gain in power is usually limited, whereas we shall show by
simulations that the gain in power can be substantial because we combine independent sources
of information. Second, the Mantel-Haenzel test, sum test and Fisher’s inverse test considered in
this section are designed for combining asymptotic independent test statistics. The null distri-
bution will be much more complex when the statistics are correlated. In fact, the maximum test
requires extensive simulations to compute the null distribution (Kosorok and Lin, 1999). On the
other hand, our proposed Fisher’s inverse test can be readily extended to combine two maximum
tests, one for the backward time and the other from follow-up data, since information from two
sources are asymptotically independent.
12 K. C. G. Chan and J. Qin
4. Simulation studies
We performed simulation studies to evaluate the performance of the proposed test statistics. In
each case, independent data sets were generated 10000 times under the null hypothesis and 1000
times under the alternative hypotheses. First, we considered a two-sample testing scenario where
X was a Bernoulli variable with p = 0.5. The survival time Y was generated by model (1.1) where
β = 0, 0.1, 0.2, 0.3, 0.4 and ε = log(ε)/4, where ε was standard exponential distributed. That is, ε
is extreme value distributed. We also considered ε to be a logistic or a normal distribution with
the same mean and variance as the above extreme value distribution to show that the tests are
robust against misspecification of parametric error distribution. To obtain a length-based sample,
we generated random truncation times A0 from a U(0, 30) distribution; an observation was in
the cross sectional sample if −A0 + Y > 0. Data were generated until the cross-sectional sample
had n = 100 or n = 200 observations. The survival endpoint was censored if an individual in
the prevalent cohort survived past 0.5 time units after recruitment. We compared the proposed
testing procedures, including the test based only on backward recurrence time without follow-up
and the four combined test statistics, to two existing methods: the linear rank statistics of Ying
(1990) based on left truncated right censored data (LTRC) and the modified log-rank statistics
of Ning et al. (2010), denoted by NSQ. A 5% significance level was chosen. Table 1 summarizes
the results from the simulation study. The results showed that all tests had an empirical Type
I error that was close to the nominal value, even for a small sample size and misspecified error
distributions. Under alternative hypotheses, there were significant improvements in power by
incorporating information from backward recurrence times. In fact, the test statistics based on
backward recurrence times alone had a higher power than the conventional log-rank test based
on left truncated and right censored data. All combined tests had improved power compared
to the conventional log-rank test or the test based only on the backward recurrence time. The
combined tests had a consistent improved power compared to the test of Ning et al. (2010)
Testing of cross-sectional survival data 13
which also recognized the length-bias data structure but utilized information in a different way.
Their test requires the nonparametric estimation of a pooled survival distribution which may
compromise power under alternative hypotheses, while the proposed tests do not require such
an estimation. Among the proposed tests, the Mantel-Haenszel test and its modification had a
consistently higher power than the sum of Chi-square and the Fisher’s test, and the modified
Mantel-Haenszel test performed slightly better than the Mantel-Haenszel test.
Next we considered a scenario where X was bivariate and contained a continuous variable.
Suppose X = (XT1 , X
T2 )T , where X1 is a Bernoulli variable with p = 0.5, and X2 is standard uni-
form distributed. The modified log-rank test of Ning et al. (2010) is only applicable to univariate
categorical X and was not considered under this scenario. We considered alternative hypotheses
that do not follow model (1.1) and also two scenarios of non-stationary incident distributions.
For stationary incident distribution, we generated random truncation times A0 from a U(0, 30)
distribution; an observation was in the cross sectional sample if −A0 + Y > 0. For covariate-
independent non-stationary disease incidence, A0 is generated from 30×Beta(1, 3) which gives
an increasing disease incidence over calendar time. For covariate-independent non-stationary dis-
ease incidence, A0 is generated from 30×Beta(1, 3 exp(−X1 − X2)). Survival time Y followed
FY (y|x) = {5/(y+5)}{1+10 exp(xT β)} with β = (0, 0), (0.5, 0), (0, 0.5), (0.5, 0.5). Note that the sur-
vival time did not follow an accelerated failure time model. The sampling of cross-sectional data
and the censoring mechanism were chosen to be the same as in the previous example. Results are
being shown in Table 2. The major conclusions were similar as in the previous example, and the
proposed methods greatly improved power compared to the existing log-rank test, even when the
alternative distributions do not follow an accelerated failure time model. The Mantel-Haenszel
test and its modification continued to outperform other estimators, and the difference in power
was greater in this scenario. The simulations also showed that the type I error remains close to
the nominal value under mild departure of the stationary disease incidence assumptions.
14 K. C. G. Chan and J. Qin
5. Application to a health survey without follow-up
Partial survival information in the form of backward recurrence times are frequently collected
in health surveys. Most surveys are administered cross-sectionally without follow-up data, due
to cost and logistic reasons. Without any prospective follow-up, survival times being collected
are all censored and conventional statistical methods are not applicable. Therefore, the partial
survival information being collected are seldom analyzed. An exception in given in McLaughlin
et al. (2010), who examined the associations between childhood adversities and the durations of
adult mental disorders using backward recurrence times collected from a nationally representative
sample from the National Comorbidity Survey Replication.
For illustrating the proposed test statistic, we analyzed a different survival outcome collected
in the same survey, the duration between suicidal thoughts. Although suicidal thoughts are re-
current events, cross-sectional surveys like the one we have usually collect the most recent onset
of the recurrent events. Therefore we only have time from the last event which is exactly the
model set-up in Section 2. The survey was administered in 2001-2002, and collected the time of
last suicidal thoughts from 1010 respondents with ages between 18 and 91. We examined whether
certain experiences from childhood are associated with the duration between suicidal thoughts
among adults. We first examined whether the duration between suicidal thoughts were associ-
ated with living with both biological parents until age 16. The proposed log-rank statistic was
9.40 with a χ21 null distribution, which gives a p-value of 0.002, whereas the proposed Wilcoxon
statistic was 13.10 with a p-value of 0.0003. As mentioned in Section 2, the log-rank statistic is
conservative under departure of proportional hazards alternative, but the result is consistent with
the Wilcoxon statistic in this application. A direct comparison of backward recurrence times also
revealed a consistent pattern, in which the mean backward recurrence time among individuals
who lived with both parents until 16 was 2.6 years (95% CI: 1.0-4.1) longer than individuals who
did not live with both parents. We also examined the associations between the duration between
Testing of cross-sectional survival data 15
suicidal thoughts with three childhood adversities: parental substance abuse, parental criminality
and family violence. The proposed log-rank statistics were 5.0, 12.1 and 0.6 for parental substance
abuse, parental criminality and family violence respectively, yielding p-values of 0.025, 0.001 and
0.431. Therefore, the data suggested that parental substance abuse and criminality were associ-
ated with the duration between suicidal thoughts at a 5% significance level. However, the data did
not suggest any association between family violence and the duration between suicidal thoughts.
We reiterate that the tests of Ying (1990) and Ning et al. (2010) are not applicable to this data
because there was no prospective follow-up.
6. Application to a prospective study of Alzheimer’s disease
Next, we applied the proposed tests to the Canadian Study of Health and Aging (Wolfson et al.,
2001). The same example has been used to study the performance of the modified log-rank test
in Ning et al. (2010). A prevalent sample of individuals suffered from dementia was collected in
1991. The date of dementia onset was collected from medical history, and a prospective follow-
up was conducted between 1991 and 1996. It is well established that the sample is subject to
length-biased sampling, see for example, Shen et al. (2009) and Ning et al. (2010) among others.
Subjects were being classified into one of the three diagnostic categories: probable Alzheimer’s
disease, possible Alzheimer’s disease and vascular dementia. We investigated whether the time
from onset to death were associated with diagnostic categories. The available data included 818
subjects with dementia, among them 393 had probable Alzheimer’s disease, 252 had possible
Alzheimer’s disease and 173 had vascular dementia. We considered pairwise comparisons and an
overall comparison among the three groups. The P-values from the different tests are summarized
in Table 3. As noted in Ning et al. (2010), the conventional log-rank test for left truncated right
censored data did not reject the null hypotheses that the survival distributions for any two of the
three groups were equal. However, statistical significant differences were found between vascular
16 K. C. G. Chan and J. Qin
and possible Alzheimer’s disease, between probable and possible Alzheimer’s disease and in the
three-sample test at a 5% significant level using the proposed Mantel-Haenszel tests. P-values
were slightly higher for the sum test and the Fisher’s test. For the three-sample test, the test of
Ning et al. (2010) obtained a marginal significant result at a 5% level of significance, whereas
the proposed Mantel-Haenszel tests had a lower P-value of 0.02. Interesting, the same conclusion
can be reached at the baseline recruitment before any follow-up began, by using the log-rank test
based on backward recurrence times.
To further understand the power of detecting a difference in a three-sample test, we ran a sim-
ulation study with estimated parameter values from the data. We fit a parametric survival model
(1.1) with an extreme-value distributed error. We estimated that (β1, β2) = (0.18,−0.05) where
β1 is the relative survival time comparing subjects with possible Alzheimer’s disease with sub-
jects with probable Alzheimer’s disease, and β2 is the relative survival time comparing subjects
with vascular dementia with subjects with probable Alzheimer’s disease. The error distribution
is ε = log(ε) where ε is an exponential distribution with an estimated mean of 3.82 years. The
result is comparable to a semiparametric analysis based on log-rank estimating equations which
estimates (β1, β2) = (0.19,−0.07). It is estimated that 47%, 35% and 18% of cases has probable
Alzheimer’s disease, possible Alzheimer’s disease and vascular dementia respectively, using the
bias corrected estimator of Chan and Wang (2012). We simulated 1000 independent data sets
using these parameters and each data set contained 818 subjects in a cross-sectional sample. The
generation of cross-sectional sample was described in Section 4. The residual censoring time from
recruitment was generated from a uniform (4.3, 5.8) distribution, consistent with the observed
data. We found that the power of the sum of Chi-square test and Fisher’s test were 76%, whereas
the power of the Mantel-Haenszel test and the modified Mantel-Haenszel test were 86%. There-
fore, the Mantel-Haenszel tests have a lower Type II error and are more likely to reject the null
hypothesis when then the alternative is true.
Testing of cross-sectional survival data 17
7. Concluding remarks
A linear rank statistic is proposed for length-biased cross-sectional survival data without follow-
up, based on backward recurrence time. Existing test statistics cannot be applied to this case be-
cause all subjects are essentially censored. Interestingly, the special cases of log-rank and Wilcoxon
statistics treat backward recurrence time as if it is the completely observed survival time. When
prospective follow-up is present, the proposed rank statistic without follow-up can be combined
with a conventional rank statistic for cross-sectional data with follow-up. Another interesting
observation is that the two rank statistics based on the same sample are asymptotically indepen-
dent. This property facilitates the combination of two statistics and four methods were explored.
Also, combing two independent statistics can substantially improve power. Based on the simula-
tion studies and the data analysis, the proposed Mantel-Haenszel tests have a larger power and
we recommend their use in practice.
The test of Ning et al. (2010) was designed for cross-sectional data with prospective follow-up.
The test is not asymptotically equivalent to our proposed method. In a cross-sectional sample
without prospective follow-up, the test of Ning et al. (2010) has infinite variability and the
proposed method still work. When forward follow-up is much shorter compared to backward time,
which happens in practice because follow-up data is typically costly to collect, the proposed test
is expected to outperform its existing competitors. Our test statistics are also applicable to the
case where follow-up time is only observed from a subsample. Existing methods would generally
discard backward information for the incomplete observations without forward follow-up, while
the proposed method can utilize backward information from the full sample.
Acknowledgments
The authors thank Prof. Anastasios Tsiatis, an associate editor and two reviewers for their help-
ful comments and suggestions which greatly improved this paper. The authors thank Professor
18 REFERENCES
Masoud Asgharian for sharing the Canadian Study of Health and Aging data. The first author
is partially supported by grant R01 HL-122212 from the National Institutes of Health. Conflict
of Interest: None declared.
References
Allison, P. D. (1985). Survival analysis of backward recurrence times. Journal of the American
Statistical Association 80(390), 315–322.
Bhattacharya, N. (1961). Sampling experiments on the combination of independent χ2 tests.
Sankhya 11, 191–196.
Chan, K. C. G., Chen, Y.Q. and Di, C.-Z. (2012). Proportional mean residual life model for
right-censored length-biased data. Biometrika 99(4), 995–1000.
Chan, K. C. G. and Wang, M.-C. (2012). Estimating incident population distribution from
prevalent data. Biometrics 68(2), 521–531.
Cox, D. R. (1962). Renewal theory . London: Methuen.
Fisher, R. A. (1932). Statistical methods for research workers. London: Oliver and Boyd.
Hajek, J., Sidak, Z. and Sen, P. K. (1967). Theory of rank tests. New York: Academic press.
Harrington, D. P. and Fleming, T. R. (1982). A class of rank test procedures for censored
survival data. Biometrika 69(3), 553–566.
Hess, K. R. (1995). Graphical methods for assessing violations of the proportional hazards
assumption in Cox regression. Statistics in Medicine 14(15), 1707–1723.
Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on cox’s regression
and life model. Biometrika 60(2), 267–278.
REFERENCES 19
Kalbfleisch, J. D. and Prentice, R. L. (2002). The statistical analysis of failure time data.
John Wiley & Sons.
Keiding, N., Fine, J. P., Hansen, O. H. and Slama, R. (2011). Accelerated failure time
regression for backward recurrence times and current durations. Statistics & Probability Let-
ters 81(7), 724–729.
Kosorok, M. R. and Lin, C.-Y. (1978). The versatility of function-indexed weighted log-rank
statistics. Journal of the American Statistical Association 94(445), 320-332.
Louv, W. C. and Littell, R. C. (1986). Combining one-sided binomial tests. Journal of the
American Statistical Association 81(394), 550–554.
Mandel, M. and Ritov, Y. (2010). The accelerated failure time model under biased sampling.
Biometrics 66(4), 1306–1308.
Mantel, N. and Haenszel, W. (1959). Statical aspects of the analysis of data from retro-
spective studies of disease. Journal of the National cancer Institute 22(4), 719–748.
Marden, J. I. (1982). Combining independent noncentral chi squared or f tests. The Annals of
Statistics 10(1), 266–277.
Martinez, R. L. M. C. and Naranjo, J. D. (2010). A pretest for choosing between logrank
and wilcoxon tests in the two-sample problem. Metron 68(2), 111–125.
McLaughlin, K. A, Green, J. G., Gruber, M. J., Sampson, N. A., Zaslavsky, A. M.
and Kessler, R. C. (2010). Childhood adversities and adult psychiatric disorders in the
national comorbidity survey replication ii: associations with persistence of dsm-iv disorders.
Archives of general psychiatry 67(2), 124–132.
Ning, J., Qin, J. and Shen, Y. (2010). Non-parametric tests for right-censored data with biased
20 REFERENCES
sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72(5),
609–630.
Ning, J., Qin, J. and Shen, Y. (2011). Buckley–James-Type Estimator with Right-Censored
and Length-Biased Data. Biometrics 67(4), 1369–1378.
Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika 65(1), 167–179.
Shen, Y., Ning, J. and Qin, J. (2009). Analyzing length-biased data with semiparametric
transformation and accelerated failure time models. Journal of the American Statistical Asso-
ciation 104(487), 1192–1202.
Tsai, W.-Y., Jewell, N. P. and Wang, M.-C. (1987). A note on the product-limit estimator
under right censoring and left truncation. Biometrika 74(4), 883–886.
Vardi, Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing
density: nonparametric estimation. Biometrika 76(4), 751–761.
Wang, M.-C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of
the American Statistical Association 86, 130–143.
Wolfson, C., Wolfson, D. B., Asgharian, M., M’Lan, C. E., Østbye, T., Rockwood,
K. and Hogan, D. B. (2001). A reevaluation of the duration of survival after the onset of
dementia. New England Journal of Medicine 344(15), 1111–1116.
Yamaguchi, K. (2003). Accelerated failure–time mover–stayer regression models for the analysis
of last–episode data. Sociological Methodology 33(1), 81–110.
Ying, Z. (1990). Linear rank statistics for truncated data. Biometrika 77(4), 909–914.
REFERENCES 21
Table 1. Percentage of hypotheses rejected by the existing and proposed tests: Two-sample testing. LTRC:conventional log-rank statistic for left truncated right censored data, NQS: modified log-rank statisticof Ning et al. (2010), Backward: proposed log-rank statistic based on backward recurrence time, MH:proposed Mantel-Haenszel statistic, MMH: proposed modified Mantel-Haenszel statistic, SUM: proposedsum of Chi-square statistic, Fisher: proposed Fisher’s inverse Chi-square statistic.
(a) Extreme value distributionn=100 n=200
Estimator\β 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4LTRC 5 11 29 55 71 5 21 61 85 96NQS 5 16 53 86 98 5 30 86 100 100
Backward 5 15 42 73 94 5 26 72 96 100MH 5 21 59 89 98 6 39 90 100 100
MMH 5 23 62 91 99 6 41 93 100 100SUM 5 18 55 86 97 5 32 88 100 100Fisher 5 18 54 87 98 5 32 88 100 100
(b) Logistic distributionn=100 n=200
Estimator\β 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4LTRC 5 11 31 54 78 5 17 53 85 97NQS 5 14 42 77 95 5 24 77 97 100
Backward 5 11 33 58 80 5 20 58 86 99MH 5 19 48 82 96 6 30 81 98 100
MMH 5 15 53 87 98 6 33 85 99 100SUM 5 15 44 80 95 5 25 78 97 100Fisher 5 15 44 79 96 5 25 79 97 100
(c) Normal distributionn=100 n=200
Estimator\β 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4LTRC 5 11 30 59 77 5 20 57 85 96NQS 5 15 45 79 94 5 25 79 98 100
Backward 6 13 33 60 81 5 21 60 91 99MH 5 18 50 84 96 6 33 84 99 100
MMH 5 18 54 87 98 6 34 87 99 100SUM 5 14 45 80 95 5 27 80 98 100Fisher 5 15 46 80 95 5 27 80 99 100
22 REFERENCES
Table 2. Percentage of hypotheses rejected by the existing and proposed tests: Bivariate covariates anda misspecified alternative model. Test procedures are the same as in Table 1.
(a) Stationary incidencen=100 n=200
Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 4 17 8 18 4 38 18 43
Backward 5 45 19 61 5 82 39 92MH 5 64 25 76 5 93 54 97
MMH 5 62 23 70 5 91 53 95SUM 4 45 18 61 4 86 41 93Fisher 4 45 18 61 4 86 41 93
(b) Non-stationary, covariate-independent incidencen=100 n=200
Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 4 21 10 20 4 41 16 46
Backward 5 42 17 53 5 79 33 89MH 4 64 25 71 5 93 47 97
MMH 4 61 23 66 4 92 46 96SUM 4 48 16 54 5 85 36 92Fisher 4 48 16 54 5 85 36 92
(c) Non-stationary, covariate-dependent incidencen=100 n=200
Estimator\β (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5)LTRC 5 23 11 26 5 48 21 50
Backward 5 59 23 74 5 90 44 98MH 6 72 32 85 6 96 56 100
MMH 5 69 29 81 5 96 54 100SUM 5 60 23 77 5 94 48 99Fisher 5 60 23 77 5 94 48 99
Table 3. P-values multiplied by 100, for the Canadian Health and Aging data. Test procedures are thesame as in Table 1. (a) Vascular vs probable; (b) Vascular vs possible; (c) Probable vs possible; (d)Three-sample test.
(a) (b) (c) (d)LTRC 42 33 71 54NQS 35 2 4 5
Backward 45 1 2 1MH 28 1 4 2
MMH 28 2 5 2SUM 56 3 5 5Fisher 52 1 6 5