Causal inference approaches for studying the relationship ... · ONLINE SUPPLEMENT FOR Causal inference approaches for studying the relationship between screening and cancer outcomes

ONLINE SUPPLEMENT FOR

Causal inference approaches for studying the relationship between screening and cancer outcomes

Duncan C. Thomasa

Appendix 1: Details of the disease model simulation

Let i = 1,…,5,000 index the sibships and j = 1,..,5, the members. For each individual,

two measured covariates Xijv and two unobserved frailties Zijv (v=1,2) were generated as

multivariate normal deviates with correlations between the two components and among

members of the same sibship. The unobserved ages at which each of 100 polyps would

eventually arise were generated by sampling from a Weibull distribution with shape

parameter 4 and relative rate depending on Xij1 and Zij1, specifically with hazard rate

ij(t) = 3 t3 Rij where Rij = exp(0 + 1Xij1 + 2Zij1)

so that ages at polyp creation Tijk(P) are given by

Tijk(P) = [ln(Uijk) / Rij]

1/4 where Uijk ~ Uniform(0,1).

The parameters of the model were adjusted so that most of these values would be

beyond an individual’s lifetime; thus, the number of polyps most individuals would

experience in their lifetimes was considerably less than the 100 simulated.

Polyps were assumed to grow as the square root of time since they first developed1, the

mass (in terms of number of cells) at time t being

Mijk(t) = [1 + ijk(t Tijk(P))]1/2 [1]

where ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), VM] is the growth rate parameter. Although

there is evidence that some polyps can regress2, this possibility was not explicitly

modeled in the simulation.

Each cell in each polyp was considered to be at risk of conversion to a carcinoma through

an n-stage process, with each mutation occurring at rate ijk. Thus, the time from each

cell’s birth to its fully malignant conversion also has a Weibull distribution with shape

parameter n, so the hazard rate for malignant conversion is given by

(where c1=2/3, c2=4/15, c3=16/105,…) and cumulative hazard

(where c1=4/15, c2=8/105, c3=32/945,…). The elapsed time from polyp creation to

malignant conversion for that polyp is generated by solving for T to

obtain

g ijk (t) = 1+ mijkt( )0

t

ò1

2

nijkn t - u( )

n-1du @ cnnijk

n 1+ mijkt( )n+12

/ mijkn

Gijk (T ) = g ijk (t)dt @0

T

ò Cnnijkn 1+ mijkt( )

n+32/ mijk

n

Gijk (T ) =U(0,1)

. [2]

Each undetected fully malignant clone is then assumed to grow exponentially at rate

ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), 32] .

Each tumor is assigned a clinically diagnosable size Sijk ~ LN(0,12), so the age at which it could be clinically diagnosed is given by

Tijk(C) = Tijk

(P) + Tijk(M) + ln(Sijk)/ijk [3]

if not previously detected by screening and excised, and if no other cancers have been

previously diagnosed. (The parameter chosen for the log mean of Sijk , 0 = 10 [Table 1],

corresponds to about 22,000 cells.)

Appendix 2: Details of the screening model simulation The age at the first recommended screening was chosen to be lognormally distributed

with mean

E(lnTij0(S)) = 0 + 2Xij2 + 3Zij2

and logarithmic standard deviation 1. The probability of being compliant (Cij0 = 1) with

that recommendation was given by a logistic function

logit Pr(Cij0=1) = 0 + 2Xij2 + 3Zij2.

The next recommended screening time for each sib was generated as a lognormal deviate

with mean similar to that described above, but with a different intercept (1 in place of

0) and additional covariates, including the square root of the total number of polyps

detected on the last completed screen and indicator variables for whether any of the

individual’s siblings had completed at least one scan by then, whether any polyps had

been found, and whether any of them had had a cancer diagnosis. The probabilities of

compliance ijs = Pr(Cijs = 1) were also expanded to include terms for each of these

additional covariates, as well as a different intercept term 1 in place of 0.

Since each member’s next screening time depends upon the outcome of the entire process

for the other family members, this necessitated processing the entire family concurrently.

At the time of each completed scan, the outcome variables for the entire family (numbers

of already completed scans, positive scans, and members with a clinically diagnosed

cancer) as of that time were computed and used to determine the next screening time for

each family member. The earliest of these was then processed next, continuing in this

manner until all family members reached their pre-specified ages at censoring.

At each screening time Tijs(S) at which an individual is compliant, the size Mijk(Tijs

(S)) of

each polyp that has not previously been excised is computed using Eq.[1] and its

probability of being detected (the event Dijks = 1) is assumed to be a logistic function of

its current linear dimension (the cube root of its mass):

Tijk(M ) =

1

mijk

Cn2n+2mijk

n+1nijkn+3(- lnU)( )

22n+3

Cn2n ijk

2-1

é

ë

êê

ù

û

úú

logit [Pr(Dijks = 1)] = 0 + 1[Mijk(Tijk(S))]1/3

If detected, that polyp is excised and considered no longer at risk of generating a

malignant clone and is not screened again.

The age at which each tumor could be diagnosed is given by Eq.[3], which does not

depend upon the screening process, so the time of an individual’s cancer diagnosis Tij(D) ,

conditional on the screening history is

Tij(D) = mink(Tijk

(P) + Tijk(M) + Sijk/ijk)

the minimum being taking over all so-far undiscovered polyps k (i.e., those with Dijks = 0

for all prior screening times Tijs(S) < Tij

(D)).

Appendix 3: Propensity scores used for simulated and real data

Following the notation of Hernan and Robins3-5, we begin by fitting a logistic model for

the probability that an individual is screened at each age:

logit Pr[Ai(t)=1 | Vi, Li(t)] = 0(t) + Vi1 + Ai(t-)2+ Li (t-) 3

where Ai(t) is an indicator for whether an individual i is screened at age t, Ai(t-) denotes

the history of screening prior to t, Li (t-) the history of polyps prior to t, and Vi is a

vector of baseline covariates. The baseline risk 0(t) is modeled parametrically, here as a

cubic polynomial although other specifications would be possible. The model is fitted by

creating a dataset in which each individual is represented once for every year up to

censoring or cancer diagnosis and then fitted as if these observations were independent.

It is fitted twice, once with Ai(t-), Li (t-) and Vi to produce , and once with only t to

produce . “Stabilized propensity score weights” swi(t) are then computed using these

estimates as

swi(t) =

Pr Ai(u) t

i(u),V

i; â(0)éë

ùû

Pr Ai(u) t

i(u),A

i(u -1),L

i(u -1),V

i; â(1)éë

ùûu=0

t

Õ

The denominator effectively weights each individual cancer case and control inversely by

the probability of their observed screening history up to that time, thereby creating a

pseudo-population in which cancer outcomes are not confounded by the determinants of

screening history. Since these denominators can be highly variable, the numerators serve

to stabilize these weights, leading to more stable estimates and smaller standard errors.

A standard nested case-control design is used to assess cancer risk in relation to screening

history. For each cancer case i at age ti, one (or more) control(s) is selected at random

from the risk set R (ti), the set of subjects at risk and disease free prior to the case’s age at cancer diagnosis, hereafter called the “reference date” (the time at comparison for each

case and matched controls). A standard conditional logistic regression is then used to

â (1)

â (0)

estimate , except that each subject’s contribution is weighted by their stabilized

propensity scores. Thus, the score contribution for the conditional likelihood of cancer

becomes

the sums in the numerator and denominator being taken over the case (j = 0) and matched

control(s) (j ≥ 1). Here, Zij(t) denotes the vector of covariates related to cancer risk,

including those related to the number of completed or positive screens during some

specified window of times prior to the reference date t and is the corresponding vector

of log-RR coefficients.

Appendix 4: Calculation of target ages at first screen and intervals between screens based on fixed covariates, family history, or both

The cumulative 10-year risk of cancer at age t in the general population, ignoring

competing risks, is given by

P(t +10 | t) =1- exp -L t +10 t( )éë ùû

where L(t2 t1) = l (u)dut1

t2

ò is the cumulative hazard based on population age (and sex)

specific incidence rates l (u) . The corresponding personal 10-year risk for an individual

i with risk factors Zi(t) at time t is given by

Pi(t +10 | t) =1- exp -L t +10 t( )éë ùû ri where 0(t) is the cumulative baseline hazard rate for an individual with Zi(t) 0 and ri =

exp[Zi(t)] is the individual’s personal relative risk. Since the population hazard rate is

simply the average over all the individuals’ hazard rates,

l (t) = E l0(t)ri exp(Zi(t ¢) b( ) = l0(t)r (t)

where r (t) is the average of the individual relative risks among survivors at age t. Hence

we can re-express personal risk in terms of the population rates as

Pi(t +10 | t) =1- exp -L t +10 t( ) ri / r (t)éë ùû

To find the age at which personal 10-year risk equals the population average risk at age

50, we must therefore solve the equation

L t +10 t( ) ri / r (t) = L 60 50( )

Since r (t) varies only slowly with age, due to the survival of the fittest effect, the left-

hand side is essentially monotonic in t, so the equation is easily solved numerically.

U(b ) = Zi0 (ti ) -Zij (ti ) e

Zij (ti ¢) b swij (ti )jåeZij (ti ¢) b swij (ti )jå

æ

èçç

ö

ø÷÷i

å

The same technique is used to find the time interval t at which the personal risk

Pi(t + Dt | t) = P(t +10 | t), the population average 10-year risk (or 5-year risk for those

with polyps found on the previous screen), incorporating the number of previous exams

as one of the time-dependent risk factors in the disease model. For simulating disease

outcomes under different screening programs, the 1-year risk of cancer for an individual

with covariate values Zi(t) at time t is simply Pi(t +1| t) =1- exp l (t)ri / r (t)éë ùû.

For the real data application, the baseline rates were computed year-by-year, based on the

fitted disease relative risk and the population age- and sex-specific rates for the Germany

North Rhine / Westphalia registry21.

Appendix 5: Details of the DACHS case-control study data

The DACHS study6,7 is an on-going population-based case-control study from Germany.

In this analysis, 4334 cases of colorectal cancer and 4231 controls recruited between

2003 and 2013 were included, with exquisitely detailed information on screening

histories for colorectal adenomas (polyps), as well as various risk factors (sex, schooling,

ever regular smoking, ever participation in a general health check-up, BMI 5-14 years

prior to the reference date, average METs, regular NSAIDS use, HRT, and statins) and

family history of colorectal cancer in first and second-degree relatives.

The screening history data for the index subjects (study cases and controls) included only

the years of the first colonoscopy and the most recent three, plus the total number of

colonoscopies, so years for any screens beyond 4 were uniformly spaced between the first

and the third-to-last ones. To avoid overlapping or closely adjacent exams, the total

number of exams was capped at 9. Exams with indication coded as “conspicuous fecal

occult blood test (FOBT) result” or “other” were excluded. Missing dates were assigned

at random between age 40 and the reference age (or interpolated between available

dates). In addition to colonoscopies, information on other screening modalities (e.g.,

FOBT) and type of exam (sigmoidoscopy or full colonoscopy) were available but were

not used in this analysis. No information was asked on screening of family members.

Year of birth, diagnosis, and death were available for each first-degree relative and for

grandparents. For aunts and uncles, only the numbers at risk and numbers affected were

available. Dates for these relatives were assigned by random sampling from the age

distribution of cancer and assuming aunts and uncles were born 25 years prior to the

index subject and grandparents 50 years prior. Again, missing years of cancer or birth for

other relatives were assigned at random, based on the index subject’s year of birth.

In addition to the use of causal inference methods, the analyses reported here differ from

those published earlier in a number of details, notably by using the total number of

screening colonoscopies over various intervals prior to the reference date rather than a

simple binary indicator for any colonoscopies over the interval 110 years prior. We also

carried out analyses using binary indicator variables; as the count variables we used here

have a wider range than the binary one, effect sizes (per screen) are correspondingly

smaller than those for the ever/never variable, but the comparisons across methods were

essentially the same (results not shown). As before, we excluded subjects with

inflammatory bowel disease, but did not exclude those under 50, or those with a most

recent exam less than one or more than 10 years previous, as we wished to assess the

effect of the entire screening history. Subjects with missing covariate values were

excluded, leaving 4065 cases and 3025 controls for fitting the disease model. These

subjects contributed 2191 screening events (1499 initial and 693 subsequent) over a total

of 346,940 person-years of observation. Of the 2,169 exams for which we had

information about results (the first and three most recent), 802 yielded polyps. We also

ran analyses replacing missing covariate values with means, with and without including a

missingness indicator variable in the screening, polyp detection, and disease models,

thereby increasing the sample size to 4,301 cases and 4,215 controls with 3,181 screens

in 419,771 person-years and 1,108 polyps discovered. The results shown in Figures 3, 5

and 6 and Tables 4 and 5 differed very little, the only important difference being

somewhat larger estimates of the observed screening effects, as shown in Supplementary

Figure 1. The comparison between analysis with and without propensity score weighting

remained the same, however.

Because the case-control study over-represents cases, fitting the propensity score model

and predicting the outcomes of counterfactual screening programs requires that cases and

controls be weighed appropriately. As described by Vanderweele and Vansteelandt8, the

appropriate weights are ratios of the population proportions of cases and subjects at risk

at each age to the corresponding numbers of cases and controls in the sample. Thus, for

age-matched nested case-control studies, cases are weighted by (t) / p(t) and controls by

[1 (t)] [1 p(t)], where p(t) denotes the proportion of cases at age t out of all cases and

controls at that age. Use of sampling weights slightly reduced the difference between

unadjusted and IPSW adjusted estimates of the relative risk for screening, but had

relatively little effect on the comparison of cancer outcomes under alternative screening

programs. As expected, there are many fewer predicted cancers after using the sampling

weights, reflecting the lower weight given to the cases, who tend to be at higher risk than

controls, but again the ranking of the predicted cancer outcomes under alternative

screening programs remained unchanged. Tables 4 and 5 and Figures 36 are all based

on the weighted estimates.

Appendix 6: Estimation of counter-factual comparisons from the real data

Outcomes under counter-factual screening programs were computed by simulation using

the fitted polyp detection and disease diagnosis models. At each age, a random choice of

whether cancer occurs or not was made with the corresponding probability and if not, the

process advanced to the next age. Any simulated screening and polyp history events after

cancer incidence are then discarded. Competing risks were included using population

death rates for all Germany from the vital statistics for 2013, subtracting colorectal cancer

incidence and assuming no covariate effects.

For the analysis of the marginal statistics provided in Table 4, a random effects model

was used to estimate the average causal effect of screening. Letting Yij denote the

simulated outcome for the jth replicate of subject i, we assumed Yij ~ N(mi, si2 ) and

mi ~ N(m,s2 ). Maximum likelihood estimates of the parameter of interest m and its

asymptotic variance are easily found by iterating between estimating m and 2 as

described by Stram9. For the binary outcome of cancer, the binomial variance estimator

si2 = Yi+ (1 Yi+) / R was used (where Yi+ denotes the total number of simulated cancer

outcomes out of R = 1000 replicates); for the other outcome variables (number of screens,

number of polyps, age at cancer), the empirical variance across replicates for each

individual was used for si2.

Supplementary Table 1: Simulated parameter values

Parameter family Value Interpretation

(model for log rate of polyp

development)

-16.5 intercept

3.0 Weibull shape

0.4 regression on X1

1.0 regression of Z1

model for log time to next

screen

3.9 intercept for age at first screen

2.3 intercept for subsequent intervals

0.1 SD of log age at first screen

0.1 SD of log interval time

-0.1 regression on X2

-0.1 regression of Z2

-0.1 regression on N previous polyps

-0.02 regression on sib’s N screens

-0.04 regression on sibs’ positive screens

-0.06 regression on sibs with cancer

model for logit probability

of compliance

-0.8 intercept for first screen

-1.0 intercept for subsequent screens



1.0 regression on N previous polyps

0.02 regression on sib’s N screens

0.04 regression on sibs’ positive screens

0.06 regression on sibs with cancer

model for polyp growth rate

6.5 intercept



0.1 SD of log growth rate

model for log cancer growth

rate

1.5 intercept



0.1 SD of log growth rate

model for logit probability

of polyp detection

-13 intercept

3.0 regression on cube root of tumor mass

model for log malignant

conversion rate

(n = 3 mutations)

-6.875 intercept



0.1 SD of log mutation rate

model for clinically

detectable tumor size

10. log mean number of cells

0.5 SD log number of cells

Supplementary Table 2: Expected number of clinically diagnosed or screen detected cancers per 100,000 under various counterfactual policies (analysis window 1-10 years prior to reference date, simulated data)

Outcome of observed

history

Outcome of untargeted screening program

Clinically diagnosed

cancers

Screen-detected cancers

No cancer Cancer No cancer Cancer

No cancer 92,832 916 99,860 112

Cancer 3,544 2,708 28 0

Outcome of observed

history Outcome of targeted screening program


No cancer 92,768 980 99,852 120

Cancer 3,844 2,408 28 0

Outcome of untargeted

screening program Outcome of targeted screening program


No cancer 94,944 1,432 99,768 120

Cancer 1,668 1,956 112 0

Supplementary Table 3: Summary of log relative risk estimates from the fitted models (DACHS data)

Risk factor

Screening

propensity

Polyp detection

Disease risk

Without PS weights

With PS weights

lnRR (S.E.) lnRR (S.E.) lnRR (S.E.) lnRR (S.E.)

sex 0.487 (0.062) 0.541 (0.140) -0.309 (0.066) -0.306 (0.067)

schooling 0.082 (0.026) -0.240 (0.034) -0.234 (0.034)

ever regular smoker 0.072 (0.033) 0.168 (0.069) 0.220 (0.037) 0.198 (0.038)

ever health check-up 0.268 (0.083) 0.266 (0.196) -0.483 (0.080) -0.479 (0.081)

Body mass index (BMI) 5-14

yr earlier (x 10) 0.018 (0.012) 0.558 (0.067) 0.054 (0.007)

Physical activity (Metabolic

Equivalents on Task (METs)

(x 1000)

-0.507 (0.186) -0.702 (0.226) -0.001 (0.000)

alcohol (x 1000) 0.149 (0.108) 0.269 (0.143) 0.003 (0.001)

nonsteroidal anti-

inflammatory drugs

(NSAIDS)

-0.247 (0.043) -0.335 (0.058) -0.345 (0.059)

hormone replacement

treatment (HRT) 0.503 (0.071) 0.210 (0.168) -0.615 (0.083) -0.605 (0.084)

statins 0.054 (0.047) -0.157 (0.068) -0.134 (0.069)

family history 0.374 (0.042) 0.220 (0.089) 0.405 (0.065) -0.855 (0.061)

total screens 0.425 (0.050) -0.132 (0.075)

first screen positive 0.021 -0.110 0.736 (0.247)

last screen positive 0.512 (0.117) 0.639 (0.251)

number of other positive

screens -0.512 (0.137) 0.309 (0.240)

interval screens (1-10 years

prior to reference date) -0.697 (0.054) -0.855 (0.061)

Time since last exam (TSLE) 0.130 (0.054)

TSLE2 -0.023 (0.008)

Supplementary Figure 1: Distribution of stabilized propensity scores by screening history subgroups (DACHS data); the inverse of these scores is used as the weights. The distributions have mean ± standard deviation as follows: no screening, 0.85 ± 0.10; always negative, 0.93 ± 0.12; ever positive, 1.17 ± 0.37.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.21

0.24

0.28

0.32

0.37

0.42

0.49

0.56

0.65

0.75

0.87

1.00

1.15

1.33

1.54

1.77

2.04

2.36

2.72

3.14

3.62

4.17

4.81

PropensityScore

Noscreening

Nega ve

Posi ve

Supplementary Figure 2: Log RR estimates per screen for the effect of the

colonoscopies on colorectal cancer risk over various windows of time prior to the reference date (DACHS data); IPSW = inverse propensity score weighting. A: Including subjects with missing covariate values replaced by means; B: Same, plus including a missingness indicator variable. These plots can be compared with those in Figure 3 of the main text.

Supplementary references

1. Dowty JG, Byrnes GB, Gertig DM. The time-evolution of DCIS size

distributions with applications to breast cancer growth and progression. Math Med Biol 2014; 31: 353-64.

2. Pickhardt PJ, Kim DH, Pooler BD, et al. Assessment of volumetric growth rates of small colorectal polyps with CT colonography: a longitudinal study of natural history. Lancet Oncol 2013; 14: 711-20.

3. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550-60.

4. Hernan MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 2008; 19: 766-79.

5. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000; 11: 561-70.

6. Brenner H, Chang-Claude J, Jansen L, et al. Reduced risk of colorectal cancer up to 10 years after screening, surveillance, or diagnostic colonoscopy. Gastroenterology 2014; 146: 709-17.

7. Brenner H, Chang-Claude J, Jansen L, et al. Colorectal cancers occurring after colonoscopy with polyp detection: sites of polyps and sites of cancers. Int J Cancer 2013; 133: 1672-9.

8. Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 2010; 172: 1339-48.

9. Stram DO. Meta-analysis of published data using a linear mixed-effects model. Biometrics 1996; 52: 536-44.

Appendix 7: C code for simulation and analysis of simulated data

What follows is the computer code that generated the simulation and analysis results

provided in Table 1, Figure 2, and Supplementary Table 2. The parameters provided in

Supplementary Table 1 are entered into the program in the constants paragraphs at the

beginning of the program. The “d” arrays which follow each refer

random deviations around these values that were used when multiple replicates were

run to investigate the sensitivity of results to changes in these parameters. This code

was adapted to generate the results from the DACHS study data, replacing the

simulation routines by procedures to read in the real data and perform various

imputations of missing values as described in Appendix 4, along with some further

modifications to replace the unobserved disease process (which is known in the

simulations but not observable in real data) by randomly generated data based on the

fitted disease incidence and polyp detection models as explained in Appendix 6. The

cases and controls for the real data were reweighted to account for their different

sampling probabilities, as explained in Appendix 5. (The real data analysis program is

tailor-made for that dataset, but can be provided to interested users by request to the

author, as some guidance will doubtless be needed to adapt it to other datasets.)

The program was compiled using Microsoft Visual Studio C++ version 6.0 and run on a

MacBook Pro under the VMware Fusion emulator of the Windows XP operating

system. The code should require very little modification to run under other operating

systems and compilers, however.

The core simulation routines are contained in Simulate() and the analysis routines in

Analyze(), both called from main() within a loop over multiple replicates. Both

routines use GetScreeningCovariates() and GetDiseaseCovariates() to get the

relevant covariate values at each point in time for each subject. The routine

NextScreen() is used to simulate the intervals between screens. A similar routine

ProgramNextScreen() is called by Analyze() as part of the counterfactual

comparisons, which in turn relies on fitted logistic regression models to determine the

TargetAgeAtFirstScreen() and TimeToAverageRisk(). These are called from

PredictCancer() for each of the counterfactual screening programs and the results are

tabulated in Counterfactuals(). Analyze() begins by drawing a nested case-control

sample in CaseControlSampling(), then for each “analysis window” calls

EstimatePropensityScores() and calls ConditionalLogisticRegression() to fit

the odds ratio models, with and without using the inverse probability weights. These

estimates are what are used by Counterfactuals() to compute the various summary

statistics (clinically diagnoses and screen detected cancers, false negative and false

positive screens, number needed to screen to prevent one cancer, etc.) under each

counterfactual screening program. That routine also tabulates the pairwise comparisons

of outcomes under different screening programs shown in Supplementary Table 2.

#include

#include

#include

#include

#include

#include

#include "gamma.h"

const int F = 5000, // number of families

M = 5, // number of sibs per family

P = 100, // maximum number of polyps in unobserved history

S = 40, // maximum number of screening times

MaxNN = 1350000, // maximum possible size of propensity score dataset

{{0,0,0,0, 0,0,0,0, 0,0,0,0},

{0,0,0,0, 0,0,0,0, 0,0,0,0},

{1,1,0,0, 1,1,0,0, 1,1,0,0},

{1,0,1,0, 1,0,1,0, 1,0,0,0},

{1,1,1,1, 1,1,0,0, 1,1,0,0}},

{{0,0,0,0, 0,0,0,0, 0,0,0,0},

{0,0,0,0, 0,0,0,0, 0,0,0,0},

{1,0,0,0, 1,0,0,0, 0,0,0,0},

{1,0,0,0, 1,0,0,0, 0,0,0,0},

{1,0,0,0, 1,0,0,0, 0,0,0,0}},

{{0,0,0,0, 0,0,0,0, 0,0,0,0},

{0,0,0,0, 0,0,0,0, 0,0,0,0},

{1,1,0,0, 1,0,0,0, 0,0,0,0},

{1,0,0,0, 1,0,0,0, 0,0,0,0},

{1,1,0,0, 1,1,0,0, 0,0,0,0}}},

NumRepl = 1;

const double SDFshared = 1,

SDFcorr = 1,

SDFindep = 1,

SDXshared = 1,

SDXcorr = 1,

SDXindep = 1,

lambda0[4] = // model for log rate of polyp development

{ -16.5, // intercept

3, // Weibull shape

0.4, // regression on TotalXforPolyps

1}, // regression on TotalFrailtyForPolyps

dlambda[4] = {0.02, 0.00, 0.005, 0.005},

sigma0[11] = // model for ln time to next screen

{ 3.912, // intercept for first screen

2.3, // intercept for subsequent screens

0.1, // LSD for first screen

0.1, // LSD for subsequent screens,

-0.1, // regression on TotalXforScreening

-0.1, // regression on TotalfrailtyForScreening

-0.1, // regression on number of previous polyps

-0.02, // regression on sib screened

-0.04, // regression on sib with polyps

-0.06, // regression on sib with cancer

+0.1}, // patch so very old people don't get screened too often

dsigma[11] = {0.02, 0.02, 0.002, 0.002, 0.002, 0.002, 0.002, 0.0004, 0.0012, 0.002, 0.02},

pi0[10] = // model for logit Pr(compliance with next recommended screen)

{ -0.8, // intercept for first screen

-1, // intercept for subsequent screens

0, // (not used)

0, // (not used)

0.5, // regression on TotalXforScreening

0.5, // regression on TotalfrailtyForScreening

1.0, // regression on number of previous polyps

0.02, // regression on sib screened

0.04, // regression on sib with polyps

0.06}, // regression on sib with cancer

dpi[10] = { 0.02, 0.02, 0, 0, 0.01, 0.01, 0.02, 0.0004, 0.0012, 0.002},

mu0[4] = // model for log polyp growth rate

{ 6.5, // intercept


0.15, // regression on TotalFrailtyForPolyps

0.1}, // SD of log polyp growth rates

dmu[4] = {0.02, 0.001, 0.001, 0.002},

nu0[4] = // model for log cancer growth rate

{ 1.5, // intercept



0.1}, // SD log cancer growth rate

dnu[4] = {0.05, 0.001, 0.001, 0.002},

psi0[2] = // model for logit probability of polyp detection

{ -13, // intercept

3.0}, // regression on polyp size

dpsi[2] = {0.2, 0.1},

omega0[4] = // polyp to cancer mutation rate (per polyp cell per year)

{ -6.875, // intercept



0.1}, // SD of log mutation rate

domega[4] = {0.05, 0.01, 0.01, 0.02},

lnAvgSizeAtCaDx0= 10, // ln average size of polyp at cancer diagnosis

dlnAvgSizeAtCaDx= 0.01,

SDlnSize0 = 0.5, // SD ln size of polyp at cancer diagnosis

dSDlnSize = 0.01,

FNwindow = 10; // years following screen used to defing a "false negative"

double lambda[4],sigma[11],pi[10],mu[4],nu[4],psi[2],omega[4],lnAvgSizeAtCaDx,SDlnSize,coeff[4][5];

int a, // index for age = 1,…,100

f, // index for families = 1,…,F

m, // index for sibs within families = 1,…,M

p, // index for polyps for each case = 1,…,P

s, // index for screening times = 1,…,S

v,v1,v2, // index for covariates (v1,v2 for pairs of covariates in information matrix)

repl; // index for replicates = 1,…,NumRepl

int Ndetected[F][M][S];

double SharedFrailty[F][V],CorrelatedFrailty[F][M],IndependentFrailty[F][M][V],TotalFrailtyForPolyps[F][M],TotalFrailtyForScreening[F][M];

double SharedX[F][V],CorrelatedX[F][M],IndependentX[F][M][V],TotalXforPolyps[F][M],TotalXforScreening[F][M];

double AgeOfPolyp[F][M][P],PolypGrowthRate[F][M][P],AgeOfMutation[F][M][P];

double AgeAtScreen[F][M][S];

double AgeAtCancer[F][M],AgeAtCancerWithoutScreening[F][M],AgeOfTumor[F][M][P],ProgramAgeAtCancer[F][M];

double AgeAtCensoring[F][M];

double CancerGrowthRate[F][M][P];

int ScreenCompliant[F][M][S];

int NumScreens[F][M],NumScreensCompleted[F][M],NumUncensoredScreens[F][M],Cancer[F][M];

int ProgramNumScreens[F][M];

int PolypDetected[F][M][P],ProgramPolypDetected[F][M][P],ScreenPositive[F][M][S];

double AgePolypDetected[F][M][P],ProgramAgePolypDetected[F][M][P];

double MeanAgeOfPolyps,VarAgeOfPolyps,MeanUncensoredNPolyps,

MeanTimeToMutation,VarTimeToMutation,MeanUncensoredNMutations,TotalNumMutations,

MeanTimeToCancer,VarTimeToCancer,

NumScreened[S],NPolypsScreened[S],MeanAgeAtScreen[S],MeanNCompliant[S];

int NumPolypsByAge[100],NumMutationsByAge[100];

int IncidentCancers[100],NumUncensored[100],UncensoredCancers[100];

int iter,output,rep,pp;

double MeanGrowthRate[S],MeanPdetectPolyp[S],MeanPolypSizeAtScreening[S],MeanTimeFromPolypToDetection[S],MeanPolypSizeDetected[S],TotalPolypsDetected[S];

int pair,Npair,CaseFamily[MaxPair],CaseMember[MaxPair],ControlFamily[MaxPair],ControlMember[MaxPair];

double PairAge[MaxPair],DiseaseCovar[MaxPair][2][MaxDiseaseCovar],beta[NumDiseaseCovar],SEbeta[NumDiseaseCovar];

double ScreeningCovar[MaxScreeningCovar];

double ScreeningBaselineHazard[100],CumulativeBaselineHazard[100],num[100],den[100];

double PropensityScore[F][M][100],MeanPropensityScore[100];

double MeanNdetected[100],MeanSibsScreened[100],MeanSibsPolyps[100],MeanSibsCancer[100],Nmeans[100];

double PredNdetected[100],PredSibsScreened[100],PredSibsPolyps[100],PredSibsCancer[100];

int nn,NNtot,NN[F][M][100],ScreenedAtAge[MaxNN];

double ZZ[MaxNN][NumVar],Alpha1[NumVar],Alpha2[NumVar],SEalpha1[NumVar],SEalpha2[NumVar];

int MinTimePrediagnosis,MaxTimePrediagnosis,minTime,maxTime,policy,PolypsFound,model;

double ProgramAgeAtScreen[F][M][S];

double SummaryPolicyMeasure[5][4][NumPolicy][NumMeasure];

double RootMeanPolypSize;

double CurrentAge;

double PreclinicallyDetectableAge[F][M][P],

NumPreclinicalTumors,NumPreclinicallyDetectableTumors,

MeanPreclinicallyDetectableTime,VarPreclinicallyDetectableTime;

int AA;

double Y10[MaxAA],Y5[MaxAA],Z10[MaxAA][12],Weight[MaxAA],

gamma[NumModel][NumPolicy][12],SEgamma[NumModel][NumPolicy][12],PopulationAverageP10;

int NumFirstScreenTargetAge[100],NumTimeBetweenScreens[2][200];

int CounterfactualClinicalCancer[F][M][NumPolicy],

CounterfactualScreenedCancer[F][M][NumPolicy],

CounterfactualPolyps[F][M][NumPolicy],

CounterfactualPositiveScreens[F][M][NumPolicy];

double AverageRisk[100][2];

char FileName[14];

FILE *sum,*sim,*chk;

// * * * * * * * * * * * * * * * * * * * *

// S I M U L A T I O N R O U T I N E S

// * * * * * * * * * * * * * * * * * * * *

void GetScreeningCovariates (int ff, int mm, double age)

{ ScreeningCovar[0] = TotalXforPolyps[ff][mm];

ScreeningCovar[1] = TotalXforScreening[ff][mm];

ScreeningCovar[4] = 0; // number of previous screens

ScreeningCovar[5] = 0; // number of positive screens

double AgeAtLastScreen=0;

for (int ss=0; ss

void GetDiseaseCovariates(int pp, int cc, int ff, int mm, double age)

{

DiseaseCovar[pp][cc][0] = TotalXforPolyps[ff][mm];

DiseaseCovar[pp][cc][1] = TotalXforScreening[ff][mm];

DiseaseCovar[pp][cc][2] = 0;

DiseaseCovar[pp][cc][3] = 0;

for (int ss=0; ss PairAge[pp]-MaxTimePrediagnosis)

{ if (ScreenCompliant[ff][mm][ss]) DiseaseCovar[pp][cc][2] ++;

if (Ndetected[ff][mm][ss]) DiseaseCovar[pp][cc][3] ++;

}

if (ScreeningVariable==1 && DiseaseCovar[pp][cc][2])

DiseaseCovar[pp][cc][2]=1; // binary version of ever screened during interval before reference date

if (NumDiseaseCovar>4)

{ GetScreeningCovariates(ff,mm,age);

for (v=0; v

if (a>97) CenterMean[v] = LastCenterMean[v];

LastCenterMean[v] = CenterMean[v];

}

MeanNdetected[a] = CenterMean[0];

MeanSibsScreened[a] = CenterMean[1];

MeanSibsPolyps[a] = CenterMean[2];

MeanSibsCancer[a] = CenterMean[3];

if (MeanSibsCancer[a]

VarAgeOfPolyps += pow(AgeOfPolyp[f][m][p],2);

if (AgeOfPolyp[f][m][p]+0.746) XdiseaseGroup=2;

MeanTimeToMutation += TimeToMutation;

VarTimeToMutation += pow(TimeToMutation,2);

MeanTimeToCancer += TimeToCancer;

VarTimeToCancer += pow(TimeToCancer,2);

if (AgeOfTumor[f][m][p]

CurrentAge=999; NextSib=-9;

for (m=0; m0.01 && error==0)

{ memset(U,0,sizeof(U));

memset(Info,0,sizeof(Info));

for (int aa=0; aa

+ gamma[4][policy][ 2]*gFH

+ gamma[4][policy][ 3]*gX*gFH

+ gamma[4][policy][ 4]*logage50

+ gamma[4][policy][ 5]*logage50*gX

+ gamma[4][policy][ 6]*logage50*gFH

+ gamma[4][policy][ 7]*logage50*gX*gFH

+ gamma[4][policy][ 8]*logage50*logage50

+ gamma[4][policy][ 9]*logage50*logage50*gX

+ gamma[4][policy][10]*logage50*logage50*gFH

+ gamma[4][policy][11]*logage50*logage50*gX*gFH;

double P5 = exp(logitP5); P5 /= 1+P5;

TTAR = 5*AverageRisk/P5;

}

else

{ double logitP10 = gamma[2][policy][ 0]

+ gamma[2][policy][ 1]*gX













}

if (TTAR

fprintf (sum,"\n\nSUMMARY STATISTICS BY SCREEN NUMBER");

fprintf (sum,"\n s Num subj Mean prop N compliant Polyp N Polyps Prob polyps Mean polyp Num polyps Time from Mean polyp ");

fprintf (sum,"\n screened age compliant (cumulative) growth screened detected size at detected polyp to size at ");

fprintf (sum,"\n rate screening per compliant detection detection");

int CumNCompliant=0;

for (s=0; s-0.746) XscreeningGroup=1;

if (TotalXforScreening[f][m]>+0.746) XscreeningGroup=2;

NXdisease[XdiseaseGroup] ++;

NXscreening[XscreeningGroup] ++;

for (a=0; a

}

fprintf (sum, "\n");

for (a=0; a

if (TotalXforPolyps[f][m]>+0.746) gX=2;

GetScreeningCovariates(f,m,double(a));

int gFH=0; if (ScreeningCovar[8]) gFH=1;

double py = AgeAtCensoring[f][m]-a;

double pyFUpos = AgeAtCensoring[f][m]-a;

if (AgeAtCancer[f][m] < AgeAtCensoring[f][m]) py = AgeAtCancer[f][m]-a;

if (py>10) py=10;

totalPY[gX][gFH][a] += py;

if (AgeAtCancer[f][m] < a+10 &&

AgeAtCancer[f][m] < AgeAtCensoring[f][m])

{ totalCases[gX][gFH][a] ++;

if (a>=20 && a5) pyFUpos=5;

totalFUposPY[gX][gFH][a] += pyFUpos;

if (AgeAtCancer[f][m] < a+5 &&

AgeAtCancer[f][m] < AgeAtCensoring[f][m])

totalFUposCases[gX][gFH][a] ++;

}

s=0;

int LastScreen=-1;

while (AgeAtScreen[f][m][s]

NcasesByNumScreens[gX]/PYbyNumScreens[gX],NcasesByNumScreens[gX],PYbyNumScreens[gX]);

fprintf (sum, "\nCancer SIR (O/E) ");

for (gX=0; gX

if (covar==1) fprintf (sum,"\n\nNumber of sibs' positive screens ");

if (covar==2) fprintf (sum,"\n\nNumber of sibs with cancer ");

if (covar==3) fprintf (sum,"\n\nSummary sib risk index ");

for (gX=0; gX

}

printf ( "\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);

fprintf (sum,"\n\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);

fprintf (sum,"\nage X0 FH=0 FH>0 FH=0 FH>0 average");

for (a=20; a

for (a=0; a

lnL += log(RRcase / (RRcase/PScase + RRctl/PSctl) );

for (int v1=0; v1

void LogisticRegression2()

{ double U[NumVar],Info[NumVar][NumVar],InvInfo[NumVar][NumVar];

// fprintf (sum,"\n\n\nIterations for conditional propensity model for screening");

double chisq2=999,Wald2;

memset (Alpha2,0,sizeof(Alpha2));

for (v=0; v0.01 && error==0)

{ memset(U,0,sizeof(U));

memset(Info,0,sizeof(Info));

for (nn=0; nn

if (v==0) fprintf (sum," intercept");

if (v==11 || v==12) fprintf (chk," %10.6f %8.6f ",Alpha2[v],SEalpha2[v]);

if (v==1) fprintf (sum," (age-70)");

if (v==2) fprintf (sum," (age-70)^2");

if (v==3) fprintf (sum," (age-70)^3");

if (v==4) fprintf (sum," X for disease");

if (v==5) fprintf (sum," X for screening");

if (v==6) fprintf (sum," First screen indicator");

if (v==7) fprintf (sum," Time since last screen");

if (v==8) fprintf (sum," Num previous screens");

if (v==9) fprintf (sum," Num positive screens");

if (v==10) fprintf (sum," Sibs previously screened");

if (v==11) fprintf (sum," Sibs previously positive");

if (v==12) fprintf (sum," Sibs with cancer");

if (v==13) fprintf (sum," Summary sib risk index");

}

fprintf (sum,"\n\nMean (SD) log Propensity Scores = %6.3f (%5.3f)\n",

MeanLnPropensityScore/(F*M),

sqrt((VarLnPropensityScore-pow(MeanLnPropensityScore,2)/(F*M))/(F*M-1)));

}

// * * * * * * * * * * * * * * * * * * * * * * * * * * * *

// P R O J E C T E D O U T C O M E S O F S C R E E N I N G P O L I C I E S

// * * * * * * * * * * * * * * * * * * * * * * * * * * * *

double TargetAgeAtFirstScreen(int policy, double age)

{ double gX = TotalXforPolyps[f][m];

GetScreeningCovariates(f,m,age);


double logitAvgP10 = log(AverageRisk[50][0] / (1 - AverageRisk[50][0]));

double A = gamma[2][policy][8] + gamma[2][policy][9]*gX + gamma[2][policy][10]*gFH + gamma[2][policy][11]*gX*gFH;

double B = gamma[2][policy][4] + gamma[2][policy][5]*gX + gamma[2][policy][ 6]*gFH + gamma[2][policy][ 7]*gX*gFH;

double C = gamma[2][policy][0] + gamma[2][policy][1]*gX + gamma[2][policy][ 2]*gFH + gamma[2][policy][ 3]*gX*gFH

- gamma[1][4][0];

double dt = (-B + sqrt(B*B-4*A*C))/(2*A);

double t = 50*exp(dt);

double logage50 = log(t/50);

if (t80) t=81;

if (policy==4) NumFirstScreenTargetAge[int(t)] ++;

// chect that predicted risk at this age equals population average

double logitP10 = gamma[2][policy][ 0]













return(t);

}

double TimeToAverageRisk(int LastPositive, int policy, double AverageRisk)

{ double TTAR=5;

double gX = TotalXforPolyps[f][m];

double t=ProgramAgeAtScreen[f][m][s-1];

double logage50 = log(t/50);

GetScreeningCovariates(f,m,t);


if (LastPositive)

{ double logitP5 = gamma[4][policy][ 0]














if (TTAR>4.9 && TTAR


}

if (TTAR=20 && ProgramAgeAtScreen[f][m][s-1]

for (s=0; s

// number of tumors prevented (i.e., polyps screen detected)

// out of all possibly preventable (i.e.,

if (AgeOfTumor[f][m][p] < AgeAtCensoring[f][m])

{ NumTumorsPreventable ++;

if (ProgramAgePolypDetected[f][m][p] < AgeOfTumor[f][m][p])

NumTumorsPrevented ++;

}

}

if (TotalPolypsThisScreen)

{ PositiveScreens ++;

CounterfactualPositiveScreens[f][m][policy] ++;

}

TotalPolyps += TotalPolypsThisScreen;

CounterfactualPolyps[f][m][policy] += TotalPolypsThisScreen;

}

}

SummaryPolicyMeasure[minTime][maxTime][policy][ 0] = TotalScreens;

SummaryPolicyMeasure[minTime][maxTime][policy][ 1] = PositiveScreens;

SummaryPolicyMeasure[minTime][maxTime][policy][ 2] = TotalPolyps;

SummaryPolicyMeasure[minTime][maxTime][policy][ 3] = TotalCancers;

SummaryPolicyMeasure[minTime][maxTime][policy][ 4] = NumFN;

// SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumCancersWithin10Years;

SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumScreenNeg;

SummaryPolicyMeasure[minTime][maxTime][policy][ 6] = NumFP;

SummaryPolicyMeasure[minTime][maxTime][policy][ 7] = NumPolypsNeverCancer;

SummaryPolicyMeasure[minTime][maxTime][policy][ 8] = NumTumorsPrevented;

SummaryPolicyMeasure[minTime][maxTime][policy][ 9] = NumTumorsPreventable;

SummaryPolicyMeasure[minTime][maxTime][policy][10] = NumCancersScreenDetected;

SummaryPolicyMeasure[minTime][maxTime][policy][11] = NumPolypsScreened;

}

void Counterfactuals()

{ int CounterfactualNumClinicalCancers[NumPolicy][NumPolicy][4],

CounterfactualNumScreenedCancers[NumPolicy][NumPolicy][4],

CounterfactualNumPolyps[NumPolicy][NumPolicy][4],

CounterfactualNumPositiveScreens[NumPolicy][NumPolicy][4];

memset (CounterfactualNumClinicalCancers,0,sizeof(CounterfactualNumClinicalCancers));

memset (CounterfactualNumScreenedCancers,0,sizeof(CounterfactualNumScreenedCancers));

memset (CounterfactualNumPolyps,0,sizeof(CounterfactualNumPolyps));

memset (CounterfactualNumPositiveScreens,0,sizeof(CounterfactualNumPositiveScreens));

int policy1,policy2;

for (policy1=1; policy1 CounterfactualPositiveScreens[f][m][policy2])

CounterfactualNumPositiveScreens[policy1][policy2][0] ++;

else if (CounterfactualPositiveScreens[f][m][policy1] < CounterfactualPositiveScreens[f][m][policy2])


else if (!CounterfactualPositiveScreens[f][m][policy1] && !CounterfactualPositiveScreens[f][m][policy2])


else CounterfactualNumPositiveScreens[policy1][policy2][3] ++;

}

}

fprintf (sum,"\n\nSummary of counterfactual outcomes for time window %d-%d",MinTimePrediagnosis,MaxTimePrediagnosis);

fprintf (sum,"\n\nCounterfactual clinical cancers:");

for (policy1=1; policy1

fprintf (sum,"\n\nCounterfactual total number of polyps detected:");

for (policy1=1; policy1

if (measure== 9) fprintf (sum," tumors preventable");

if (measure==10) fprintf (sum," total cancers screen detected");

if (measure==11) fprintf (sum," total polyps screened (detected or not)");

if (measure==12) fprintf (sum," polyps detected");

if (measure==1)

{ fprintf (sum,"\n%2d/%2d ",1,0);

for (policy=0; policy

fprintf (sum,"\n\nF=%d, M=%d, P=%d",F,M,P);

fprintf (sum,"\nlambda "); for (v=0; v

Documents

Causal inference approaches for studying the relationship ... · ONLINE SUPPLEMENT FOR Causal inference approaches for studying the relationship between screening and cancer outcomes