37
ONLINE SUPPLEMENT FOR Causal inference approaches for studying the relationship between screening and cancer outcomes Duncan C. Thomas a Appendix 1: Details of the disease model simulation Let i = 1,…,5,000 index the sibships and j = 1,..,5, the members. For each individual, two measured covariates Xijv and two unobserved frailties Zijv (v=1,2) were generated as multivariate normal deviates with correlations between the two components and among members of the same sibship. The unobserved ages at which each of 100 polyps would eventually arise were generated by sampling from a Weibull distribution with shape parameter 4 and relative rate depending on Xij1 and Zij1, specifically with hazard rate ij(t) = 3 t 3 Rij where Rij = exp(0 + 1Xij1 + 2Zij1) so that ages at polyp creation Tijk (P) are given by Tijk (P) = [ln(Uijk) / Rij] 1/4 where Uijk ~ Uniform(0,1). The parameters of the model were adjusted so that most of these values would be beyond an individual’s lifetime; thus, the number of polyps most individuals would experience in their lifetimes was considerably less than the 100 simulated. Polyps were assumed to grow as the square root of time since they first developed 1 , the mass (in terms of number of cells) at time t being Mijk(t) = [1 + ijk(t Tijk (P) )] 1/2 [1] where ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), VM] is the growth rate parameter. Although there is evidence that some polyps can regress 2 , this possibility was not explicitly modeled in the simulation. Each cell in each polyp was considered to be at risk of conversion to a carcinoma through an n-stage process, with each mutation occurring at rate ijk. Thus, the time from each cell’s birth to its fully malignant conversion also has a Weibull distribution with shape parameter n, so the hazard rate for malignant conversion is given by (where c1=2/3, c2=4/15, c3=16/105,…) and cumulative hazard (where c1=4/15, c2=8/105, c3=32/945,…). The elapsed time from polyp creation to malignant conversion for that polyp is generated by solving for T to obtain γ ijk (t ) = 1 + μ ijk t ( = 0 t 1 2 ν ijk n t - u ( = n-1 du c n ν ijk n 1 + μ ijk t ( = n+ 1 2 / μ ijk n Γ ijk (T ) = γ ijk (t ) dt 0 T C n ν ijk n 1 + μ ijk t ( = n+ 3 2 / μ ijk n Γ ijk (T ) = U(0,1)

Causal inference approaches for studying the relationship ... · ONLINE SUPPLEMENT FOR Causal inference approaches for studying the relationship between screening and cancer outcomes

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • ONLINE SUPPLEMENT FOR

    Causal inference approaches for studying the relationship between screening and cancer outcomes

    Duncan C. Thomasa

    Appendix 1: Details of the disease model simulation

    Let i = 1,…,5,000 index the sibships and j = 1,..,5, the members. For each individual,

    two measured covariates Xijv and two unobserved frailties Zijv (v=1,2) were generated as

    multivariate normal deviates with correlations between the two components and among

    members of the same sibship. The unobserved ages at which each of 100 polyps would

    eventually arise were generated by sampling from a Weibull distribution with shape

    parameter 4 and relative rate depending on Xij1 and Zij1, specifically with hazard rate

    ij(t) = 3 t3 Rij where Rij = exp(0 + 1Xij1 + 2Zij1)

    so that ages at polyp creation Tijk(P) are given by

    Tijk(P) = [ln(Uijk) / Rij]

    1/4 where Uijk ~ Uniform(0,1).

    The parameters of the model were adjusted so that most of these values would be

    beyond an individual’s lifetime; thus, the number of polyps most individuals would

    experience in their lifetimes was considerably less than the 100 simulated.

    Polyps were assumed to grow as the square root of time since they first developed1, the

    mass (in terms of number of cells) at time t being

    Mijk(t) = [1 + ijk(t Tijk(P))]1/2 [1]

    where ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), VM] is the growth rate parameter. Although

    there is evidence that some polyps can regress2, this possibility was not explicitly

    modeled in the simulation.

    Each cell in each polyp was considered to be at risk of conversion to a carcinoma through

    an n-stage process, with each mutation occurring at rate ijk. Thus, the time from each

    cell’s birth to its fully malignant conversion also has a Weibull distribution with shape

    parameter n, so the hazard rate for malignant conversion is given by

    (where c1=2/3, c2=4/15, c3=16/105,…) and cumulative hazard

    (where c1=4/15, c2=8/105, c3=32/945,…). The elapsed time from polyp creation to

    malignant conversion for that polyp is generated by solving for T to

    obtain

    g ijk (t) = 1+ mijkt( )0

    t

    ò1

    2

    nijkn t - u( )

    n-1du @ cnnijk

    n 1+ mijkt( )n+12

    / mijkn

    Gijk (T ) = g ijk (t)dt @0

    T

    ò Cnnijkn 1+ mijkt( )

    n+32/ mijk

    n

    Gijk (T ) =U(0,1)

  • . [2]

    Each undetected fully malignant clone is then assumed to grow exponentially at rate

    ijk ~ LN[exp(0 + 1Xij1 + 2Zij1), 32] .

    Each tumor is assigned a clinically diagnosable size Sijk ~ LN(0,12), so the age at which it could be clinically diagnosed is given by

    Tijk(C) = Tijk

    (P) + Tijk(M) + ln(Sijk)/ijk [3]

    if not previously detected by screening and excised, and if no other cancers have been

    previously diagnosed. (The parameter chosen for the log mean of Sijk , 0 = 10 [Table 1],

    corresponds to about 22,000 cells.)

    Appendix 2: Details of the screening model simulation The age at the first recommended screening was chosen to be lognormally distributed

    with mean

    E(lnTij0(S)) = 0 + 2Xij2 + 3Zij2

    and logarithmic standard deviation 1. The probability of being compliant (Cij0 = 1) with

    that recommendation was given by a logistic function

    logit Pr(Cij0=1) = 0 + 2Xij2 + 3Zij2.

    The next recommended screening time for each sib was generated as a lognormal deviate

    with mean similar to that described above, but with a different intercept (1 in place of

    0) and additional covariates, including the square root of the total number of polyps

    detected on the last completed screen and indicator variables for whether any of the

    individual’s siblings had completed at least one scan by then, whether any polyps had

    been found, and whether any of them had had a cancer diagnosis. The probabilities of

    compliance ijs = Pr(Cijs = 1) were also expanded to include terms for each of these

    additional covariates, as well as a different intercept term 1 in place of 0.

    Since each member’s next screening time depends upon the outcome of the entire process

    for the other family members, this necessitated processing the entire family concurrently.

    At the time of each completed scan, the outcome variables for the entire family (numbers

    of already completed scans, positive scans, and members with a clinically diagnosed

    cancer) as of that time were computed and used to determine the next screening time for

    each family member. The earliest of these was then processed next, continuing in this

    manner until all family members reached their pre-specified ages at censoring.

    At each screening time Tijs(S) at which an individual is compliant, the size Mijk(Tijs

    (S)) of

    each polyp that has not previously been excised is computed using Eq.[1] and its

    probability of being detected (the event Dijks = 1) is assumed to be a logistic function of

    its current linear dimension (the cube root of its mass):

    Tijk(M ) =

    1

    mijk

    Cn2n+2mijk

    n+1nijkn+3(- lnU)( )

    22n+3

    Cn2n ijk

    2-1

    é

    ë

    êê

    ù

    û

    úú

  • logit [Pr(Dijks = 1)] = 0 + 1[Mijk(Tijk(S))]1/3

    If detected, that polyp is excised and considered no longer at risk of generating a

    malignant clone and is not screened again.

    The age at which each tumor could be diagnosed is given by Eq.[3], which does not

    depend upon the screening process, so the time of an individual’s cancer diagnosis Tij(D) ,

    conditional on the screening history is

    Tij(D) = mink(Tijk

    (P) + Tijk(M) + Sijk/ijk)

    the minimum being taking over all so-far undiscovered polyps k (i.e., those with Dijks = 0

    for all prior screening times Tijs(S) < Tij

    (D)).

    Appendix 3: Propensity scores used for simulated and real data

    Following the notation of Hernan and Robins3-5, we begin by fitting a logistic model for

    the probability that an individual is screened at each age:

    logit Pr[Ai(t)=1 | Vi, Li(t)] = 0(t) + Vi1 + Ai(t-)2+ Li (t-) 3

    where Ai(t) is an indicator for whether an individual i is screened at age t, Ai(t-) denotes

    the history of screening prior to t, Li (t-) the history of polyps prior to t, and Vi is a

    vector of baseline covariates. The baseline risk 0(t) is modeled parametrically, here as a

    cubic polynomial although other specifications would be possible. The model is fitted by

    creating a dataset in which each individual is represented once for every year up to

    censoring or cancer diagnosis and then fitted as if these observations were independent.

    It is fitted twice, once with Ai(t-), Li (t-) and Vi to produce , and once with only t to

    produce . “Stabilized propensity score weights” swi(t) are then computed using these

    estimates as

    swi(t) =

    Pr Ai(u) t

    i(u),V

    i; â(0)éë

    ùû

    Pr Ai(u) t

    i(u),A

    i(u -1),L

    i(u -1),V

    i; â(1)éë

    ùûu=0

    t

    Õ

    The denominator effectively weights each individual cancer case and control inversely by

    the probability of their observed screening history up to that time, thereby creating a

    pseudo-population in which cancer outcomes are not confounded by the determinants of

    screening history. Since these denominators can be highly variable, the numerators serve

    to stabilize these weights, leading to more stable estimates and smaller standard errors.

    A standard nested case-control design is used to assess cancer risk in relation to screening

    history. For each cancer case i at age ti, one (or more) control(s) is selected at random

    from the risk set R (ti), the set of subjects at risk and disease free prior to the case’s age at cancer diagnosis, hereafter called the “reference date” (the time at comparison for each

    case and matched controls). A standard conditional logistic regression is then used to

    â (1)

    â (0)

  • estimate , except that each subject’s contribution is weighted by their stabilized

    propensity scores. Thus, the score contribution for the conditional likelihood of cancer

    becomes

    the sums in the numerator and denominator being taken over the case (j = 0) and matched

    control(s) (j ≥ 1). Here, Zij(t) denotes the vector of covariates related to cancer risk,

    including those related to the number of completed or positive screens during some

    specified window of times prior to the reference date t and is the corresponding vector

    of log-RR coefficients.

    Appendix 4: Calculation of target ages at first screen and intervals between screens based on fixed covariates, family history, or both

    The cumulative 10-year risk of cancer at age t in the general population, ignoring

    competing risks, is given by

    P(t +10 | t) =1- exp -L t +10 t( )éë ùû

    where L(t2 t1) = l (u)dut1

    t2

    ò is the cumulative hazard based on population age (and sex)

    specific incidence rates l (u) . The corresponding personal 10-year risk for an individual

    i with risk factors Zi(t) at time t is given by

    Pi(t +10 | t) =1- exp -L t +10 t( )éë ùû ri where 0(t) is the cumulative baseline hazard rate for an individual with Zi(t) 0 and ri =

    exp[Zi(t)] is the individual’s personal relative risk. Since the population hazard rate is

    simply the average over all the individuals’ hazard rates,

    l (t) = E l0(t)ri exp(Zi(t ¢) b( ) = l0(t)r (t)

    where r (t) is the average of the individual relative risks among survivors at age t. Hence

    we can re-express personal risk in terms of the population rates as

    Pi(t +10 | t) =1- exp -L t +10 t( ) ri / r (t)éë ùû

    To find the age at which personal 10-year risk equals the population average risk at age

    50, we must therefore solve the equation

    L t +10 t( ) ri / r (t) = L 60 50( )

    Since r (t) varies only slowly with age, due to the survival of the fittest effect, the left-

    hand side is essentially monotonic in t, so the equation is easily solved numerically.

    U(b ) = Zi0 (ti ) -Zij (ti ) e

    Zij (ti ¢) b swij (ti )jåeZij (ti ¢) b swij (ti )jå

    æ

    èçç

    ö

    ø÷÷i

    å

  • The same technique is used to find the time interval t at which the personal risk

    Pi(t + Dt | t) = P(t +10 | t), the population average 10-year risk (or 5-year risk for those

    with polyps found on the previous screen), incorporating the number of previous exams

    as one of the time-dependent risk factors in the disease model. For simulating disease

    outcomes under different screening programs, the 1-year risk of cancer for an individual

    with covariate values Zi(t) at time t is simply Pi(t +1| t) =1- exp l (t)ri / r (t)éë ùû.

    For the real data application, the baseline rates were computed year-by-year, based on the

    fitted disease relative risk and the population age- and sex-specific rates for the Germany

    North Rhine / Westphalia registry21.

    Appendix 5: Details of the DACHS case-control study data

    The DACHS study6,7 is an on-going population-based case-control study from Germany.

    In this analysis, 4334 cases of colorectal cancer and 4231 controls recruited between

    2003 and 2013 were included, with exquisitely detailed information on screening

    histories for colorectal adenomas (polyps), as well as various risk factors (sex, schooling,

    ever regular smoking, ever participation in a general health check-up, BMI 5-14 years

    prior to the reference date, average METs, regular NSAIDS use, HRT, and statins) and

    family history of colorectal cancer in first and second-degree relatives.

    The screening history data for the index subjects (study cases and controls) included only

    the years of the first colonoscopy and the most recent three, plus the total number of

    colonoscopies, so years for any screens beyond 4 were uniformly spaced between the first

    and the third-to-last ones. To avoid overlapping or closely adjacent exams, the total

    number of exams was capped at 9. Exams with indication coded as “conspicuous fecal

    occult blood test (FOBT) result” or “other” were excluded. Missing dates were assigned

    at random between age 40 and the reference age (or interpolated between available

    dates). In addition to colonoscopies, information on other screening modalities (e.g.,

    FOBT) and type of exam (sigmoidoscopy or full colonoscopy) were available but were

    not used in this analysis. No information was asked on screening of family members.

    Year of birth, diagnosis, and death were available for each first-degree relative and for

    grandparents. For aunts and uncles, only the numbers at risk and numbers affected were

    available. Dates for these relatives were assigned by random sampling from the age

    distribution of cancer and assuming aunts and uncles were born 25 years prior to the

    index subject and grandparents 50 years prior. Again, missing years of cancer or birth for

    other relatives were assigned at random, based on the index subject’s year of birth.

    In addition to the use of causal inference methods, the analyses reported here differ from

    those published earlier in a number of details, notably by using the total number of

    screening colonoscopies over various intervals prior to the reference date rather than a

    simple binary indicator for any colonoscopies over the interval 110 years prior. We also

    carried out analyses using binary indicator variables; as the count variables we used here

    have a wider range than the binary one, effect sizes (per screen) are correspondingly

  • smaller than those for the ever/never variable, but the comparisons across methods were

    essentially the same (results not shown). As before, we excluded subjects with

    inflammatory bowel disease, but did not exclude those under 50, or those with a most

    recent exam less than one or more than 10 years previous, as we wished to assess the

    effect of the entire screening history. Subjects with missing covariate values were

    excluded, leaving 4065 cases and 3025 controls for fitting the disease model. These

    subjects contributed 2191 screening events (1499 initial and 693 subsequent) over a total

    of 346,940 person-years of observation. Of the 2,169 exams for which we had

    information about results (the first and three most recent), 802 yielded polyps. We also

    ran analyses replacing missing covariate values with means, with and without including a

    missingness indicator variable in the screening, polyp detection, and disease models,

    thereby increasing the sample size to 4,301 cases and 4,215 controls with 3,181 screens

    in 419,771 person-years and 1,108 polyps discovered. The results shown in Figures 3, 5

    and 6 and Tables 4 and 5 differed very little, the only important difference being

    somewhat larger estimates of the observed screening effects, as shown in Supplementary

    Figure 1. The comparison between analysis with and without propensity score weighting

    remained the same, however.

    Because the case-control study over-represents cases, fitting the propensity score model

    and predicting the outcomes of counterfactual screening programs requires that cases and

    controls be weighed appropriately. As described by Vanderweele and Vansteelandt8, the

    appropriate weights are ratios of the population proportions of cases and subjects at risk

    at each age to the corresponding numbers of cases and controls in the sample. Thus, for

    age-matched nested case-control studies, cases are weighted by (t) / p(t) and controls by

    [1 (t)] [1 p(t)], where p(t) denotes the proportion of cases at age t out of all cases and

    controls at that age. Use of sampling weights slightly reduced the difference between

    unadjusted and IPSW adjusted estimates of the relative risk for screening, but had

    relatively little effect on the comparison of cancer outcomes under alternative screening

    programs. As expected, there are many fewer predicted cancers after using the sampling

    weights, reflecting the lower weight given to the cases, who tend to be at higher risk than

    controls, but again the ranking of the predicted cancer outcomes under alternative

    screening programs remained unchanged. Tables 4 and 5 and Figures 36 are all based

    on the weighted estimates.

    Appendix 6: Estimation of counter-factual comparisons from the real data

    Outcomes under counter-factual screening programs were computed by simulation using

    the fitted polyp detection and disease diagnosis models. At each age, a random choice of

    whether cancer occurs or not was made with the corresponding probability and if not, the

    process advanced to the next age. Any simulated screening and polyp history events after

    cancer incidence are then discarded. Competing risks were included using population

    death rates for all Germany from the vital statistics for 2013, subtracting colorectal cancer

    incidence and assuming no covariate effects.

    For the analysis of the marginal statistics provided in Table 4, a random effects model

    was used to estimate the average causal effect of screening. Letting Yij denote the

  • simulated outcome for the jth replicate of subject i, we assumed Yij ~ N(mi, si2 ) and

    mi ~ N(m,s2 ). Maximum likelihood estimates of the parameter of interest m and its

    asymptotic variance are easily found by iterating between estimating m and 2 as

    described by Stram9. For the binary outcome of cancer, the binomial variance estimator

    si2 = Yi+ (1 Yi+) / R was used (where Yi+ denotes the total number of simulated cancer

    outcomes out of R = 1000 replicates); for the other outcome variables (number of screens,

    number of polyps, age at cancer), the empirical variance across replicates for each

    individual was used for si2.

  • Supplementary Table 1: Simulated parameter values

    Parameter family Value Interpretation

    (model for log rate of polyp

    development)

    -16.5 intercept

    3.0 Weibull shape

    0.4 regression on X1

    1.0 regression of Z1

    model for log time to next

    screen

    3.9 intercept for age at first screen

    2.3 intercept for subsequent intervals

    0.1 SD of log age at first screen

    0.1 SD of log interval time

    -0.1 regression on X2

    -0.1 regression of Z2

    -0.1 regression on N previous polyps

    -0.02 regression on sib’s N screens

    -0.04 regression on sibs’ positive screens

    -0.06 regression on sibs with cancer

    model for logit probability

    of compliance

    -0.8 intercept for first screen

    -1.0 intercept for subsequent screens

    0.5 regression on X2

    0.5 regression of Z2

    1.0 regression on N previous polyps

    0.02 regression on sib’s N screens

    0.04 regression on sibs’ positive screens

    0.06 regression on sibs with cancer

    model for polyp growth rate

    6.5 intercept

    0.1 regression on X1

    0.15 regression of Z1

    0.1 SD of log growth rate

    model for log cancer growth

    rate

    1.5 intercept

    0.1 regression on X1

    0.15 regression of Z1

    0.1 SD of log growth rate

    model for logit probability

    of polyp detection

    -13 intercept

    3.0 regression on cube root of tumor mass

    model for log malignant

    conversion rate

    (n = 3 mutations)

    -6.875 intercept

    0.1 regression on X1

    0.15 regression of Z1

    0.1 SD of log mutation rate

    model for clinically

    detectable tumor size

    10. log mean number of cells

    0.5 SD log number of cells

  • Supplementary Table 2: Expected number of clinically diagnosed or screen detected cancers per 100,000 under various counterfactual policies (analysis window 1-10 years prior to reference date, simulated data)

    Outcome of observed

    history

    Outcome of untargeted screening program

    Clinically diagnosed

    cancers

    Screen-detected cancers

    No cancer Cancer No cancer Cancer

    No cancer 92,832 916 99,860 112

    Cancer 3,544 2,708 28 0

    Outcome of observed

    history Outcome of targeted screening program

    No cancer Cancer No cancer Cancer

    No cancer 92,768 980 99,852 120

    Cancer 3,844 2,408 28 0

    Outcome of untargeted

    screening program Outcome of targeted screening program

    No cancer Cancer No cancer Cancer

    No cancer 94,944 1,432 99,768 120

    Cancer 1,668 1,956 112 0

  • Supplementary Table 3: Summary of log relative risk estimates from the fitted models (DACHS data)

    Risk factor

    Screening

    propensity

    Polyp detection

    Disease risk

    Without PS weights

    With PS weights

    lnRR (S.E.) lnRR (S.E.) lnRR (S.E.) lnRR (S.E.)

    sex 0.487 (0.062) 0.541 (0.140) -0.309 (0.066) -0.306 (0.067)

    schooling 0.082 (0.026) -0.240 (0.034) -0.234 (0.034)

    ever regular smoker 0.072 (0.033) 0.168 (0.069) 0.220 (0.037) 0.198 (0.038)

    ever health check-up 0.268 (0.083) 0.266 (0.196) -0.483 (0.080) -0.479 (0.081)

    Body mass index (BMI) 5-14

    yr earlier (x 10) 0.018 (0.012) 0.558 (0.067) 0.054 (0.007)

    Physical activity (Metabolic

    Equivalents on Task (METs)

    (x 1000)

    -0.507 (0.186) -0.702 (0.226) -0.001 (0.000)

    alcohol (x 1000) 0.149 (0.108) 0.269 (0.143) 0.003 (0.001)

    nonsteroidal anti-

    inflammatory drugs

    (NSAIDS)

    -0.247 (0.043) -0.335 (0.058) -0.345 (0.059)

    hormone replacement

    treatment (HRT) 0.503 (0.071) 0.210 (0.168) -0.615 (0.083) -0.605 (0.084)

    statins 0.054 (0.047) -0.157 (0.068) -0.134 (0.069)

    family history 0.374 (0.042) 0.220 (0.089) 0.405 (0.065) -0.855 (0.061)

    total screens 0.425 (0.050) -0.132 (0.075)

    first screen positive 0.021 -0.110 0.736 (0.247)

    last screen positive 0.512 (0.117) 0.639 (0.251)

    number of other positive

    screens -0.512 (0.137) 0.309 (0.240)

    interval screens (1-10 years

    prior to reference date) -0.697 (0.054) -0.855 (0.061)

    Time since last exam (TSLE) 0.130 (0.054)

    TSLE2 -0.023 (0.008)

  • Supplementary Figure 1: Distribution of stabilized propensity scores by screening history subgroups (DACHS data); the inverse of these scores is used as the weights. The distributions have mean ± standard deviation as follows: no screening, 0.85 ± 0.10; always negative, 0.93 ± 0.12; ever positive, 1.17 ± 0.37.

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.21

    0.24

    0.28

    0.32

    0.37

    0.42

    0.49

    0.56

    0.65

    0.75

    0.87

    1.00

    1.15

    1.33

    1.54

    1.77

    2.04

    2.36

    2.72

    3.14

    3.62

    4.17

    4.81

    PropensityScore

    Noscreening

    Nega ve

    Posi ve

  • Supplementary Figure 2: Log RR estimates per screen for the effect of the

    colonoscopies on colorectal cancer risk over various windows of time prior to the reference date (DACHS data); IPSW = inverse propensity score weighting. A: Including subjects with missing covariate values replaced by means; B: Same, plus including a missingness indicator variable. These plots can be compared with those in Figure 3 of the main text.

  • Supplementary references

    1. Dowty JG, Byrnes GB, Gertig DM. The time-evolution of DCIS size

    distributions with applications to breast cancer growth and progression. Math Med Biol 2014; 31: 353-64.

    2. Pickhardt PJ, Kim DH, Pooler BD, et al. Assessment of volumetric growth rates of small colorectal polyps with CT colonography: a longitudinal study of natural history. Lancet Oncol 2013; 14: 711-20.

    3. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550-60.

    4. Hernan MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 2008; 19: 766-79.

    5. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000; 11: 561-70.

    6. Brenner H, Chang-Claude J, Jansen L, et al. Reduced risk of colorectal cancer up to 10 years after screening, surveillance, or diagnostic colonoscopy. Gastroenterology 2014; 146: 709-17.

    7. Brenner H, Chang-Claude J, Jansen L, et al. Colorectal cancers occurring after colonoscopy with polyp detection: sites of polyps and sites of cancers. Int J Cancer 2013; 133: 1672-9.

    8. Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 2010; 172: 1339-48.

    9. Stram DO. Meta-analysis of published data using a linear mixed-effects model. Biometrics 1996; 52: 536-44.

  • Appendix 7: C code for simulation and analysis of simulated data

    What follows is the computer code that generated the simulation and analysis results

    provided in Table 1, Figure 2, and Supplementary Table 2. The parameters provided in

    Supplementary Table 1 are entered into the program in the constants paragraphs at the

    beginning of the program. The “d” arrays which follow each refer

    random deviations around these values that were used when multiple replicates were

    run to investigate the sensitivity of results to changes in these parameters. This code

    was adapted to generate the results from the DACHS study data, replacing the

    simulation routines by procedures to read in the real data and perform various

    imputations of missing values as described in Appendix 4, along with some further

    modifications to replace the unobserved disease process (which is known in the

    simulations but not observable in real data) by randomly generated data based on the

    fitted disease incidence and polyp detection models as explained in Appendix 6. The

    cases and controls for the real data were reweighted to account for their different

    sampling probabilities, as explained in Appendix 5. (The real data analysis program is

    tailor-made for that dataset, but can be provided to interested users by request to the

    author, as some guidance will doubtless be needed to adapt it to other datasets.)

    The program was compiled using Microsoft Visual Studio C++ version 6.0 and run on a

    MacBook Pro under the VMware Fusion emulator of the Windows XP operating

    system. The code should require very little modification to run under other operating

    systems and compilers, however.

    The core simulation routines are contained in Simulate() and the analysis routines in

    Analyze(), both called from main() within a loop over multiple replicates. Both

    routines use GetScreeningCovariates() and GetDiseaseCovariates() to get the

    relevant covariate values at each point in time for each subject. The routine

    NextScreen() is used to simulate the intervals between screens. A similar routine

    ProgramNextScreen() is called by Analyze() as part of the counterfactual

    comparisons, which in turn relies on fitted logistic regression models to determine the

    TargetAgeAtFirstScreen() and TimeToAverageRisk(). These are called from

    PredictCancer() for each of the counterfactual screening programs and the results are

    tabulated in Counterfactuals(). Analyze() begins by drawing a nested case-control

    sample in CaseControlSampling(), then for each “analysis window” calls

    EstimatePropensityScores() and calls ConditionalLogisticRegression() to fit

    the odds ratio models, with and without using the inverse probability weights. These

    estimates are what are used by Counterfactuals() to compute the various summary

    statistics (clinically diagnoses and screen detected cancers, false negative and false

    positive screens, number needed to screen to prevent one cancer, etc.) under each

    counterfactual screening program. That routine also tabulates the pairwise comparisons

    of outcomes under different screening programs shown in Supplementary Table 2.

    #include

    #include

    #include

    #include

    #include

    #include

    #include "gamma.h"

    const int F = 5000, // number of families

    M = 5, // number of sibs per family

    P = 100, // maximum number of polyps in unobserved history

    S = 40, // maximum number of screening times

    MaxNN = 1350000, // maximum possible size of propensity score dataset

  • {{0,0,0,0, 0,0,0,0, 0,0,0,0},

    {0,0,0,0, 0,0,0,0, 0,0,0,0},

    {1,1,0,0, 1,1,0,0, 1,1,0,0},

    {1,0,1,0, 1,0,1,0, 1,0,0,0},

    {1,1,1,1, 1,1,0,0, 1,1,0,0}},

    {{0,0,0,0, 0,0,0,0, 0,0,0,0},

    {0,0,0,0, 0,0,0,0, 0,0,0,0},

    {1,0,0,0, 1,0,0,0, 0,0,0,0},

    {1,0,0,0, 1,0,0,0, 0,0,0,0},

    {1,0,0,0, 1,0,0,0, 0,0,0,0}},

    {{0,0,0,0, 0,0,0,0, 0,0,0,0},

    {0,0,0,0, 0,0,0,0, 0,0,0,0},

    {1,1,0,0, 1,0,0,0, 0,0,0,0},

    {1,0,0,0, 1,0,0,0, 0,0,0,0},

    {1,1,0,0, 1,1,0,0, 0,0,0,0}}},

    NumRepl = 1;

    const double SDFshared = 1,

    SDFcorr = 1,

    SDFindep = 1,

    SDXshared = 1,

    SDXcorr = 1,

    SDXindep = 1,

    lambda0[4] = // model for log rate of polyp development

    { -16.5, // intercept

    3, // Weibull shape

    0.4, // regression on TotalXforPolyps

    1}, // regression on TotalFrailtyForPolyps

    dlambda[4] = {0.02, 0.00, 0.005, 0.005},

    sigma0[11] = // model for ln time to next screen

    { 3.912, // intercept for first screen

    2.3, // intercept for subsequent screens

    0.1, // LSD for first screen

    0.1, // LSD for subsequent screens,

    -0.1, // regression on TotalXforScreening

    -0.1, // regression on TotalfrailtyForScreening

    -0.1, // regression on number of previous polyps

    -0.02, // regression on sib screened

    -0.04, // regression on sib with polyps

    -0.06, // regression on sib with cancer

    +0.1}, // patch so very old people don't get screened too often

    dsigma[11] = {0.02, 0.02, 0.002, 0.002, 0.002, 0.002, 0.002, 0.0004, 0.0012, 0.002, 0.02},

    pi0[10] = // model for logit Pr(compliance with next recommended screen)

    { -0.8, // intercept for first screen

    -1, // intercept for subsequent screens

    0, // (not used)

    0, // (not used)

    0.5, // regression on TotalXforScreening

    0.5, // regression on TotalfrailtyForScreening

    1.0, // regression on number of previous polyps

    0.02, // regression on sib screened

    0.04, // regression on sib with polyps

    0.06}, // regression on sib with cancer

    dpi[10] = { 0.02, 0.02, 0, 0, 0.01, 0.01, 0.02, 0.0004, 0.0012, 0.002},

    mu0[4] = // model for log polyp growth rate

    { 6.5, // intercept

    0.1, // regression on TotalXforPolyps

    0.15, // regression on TotalFrailtyForPolyps

    0.1}, // SD of log polyp growth rates

    dmu[4] = {0.02, 0.001, 0.001, 0.002},

    nu0[4] = // model for log cancer growth rate

    { 1.5, // intercept

    0.1, // regression on TotalXforPolyps

    0.15, // regression on TotalFrailtyForPolyps

    0.1}, // SD log cancer growth rate

    dnu[4] = {0.05, 0.001, 0.001, 0.002},

    psi0[2] = // model for logit probability of polyp detection

    { -13, // intercept

    3.0}, // regression on polyp size

    dpsi[2] = {0.2, 0.1},

    omega0[4] = // polyp to cancer mutation rate (per polyp cell per year)

    { -6.875, // intercept

    0.1, // regression on TotalXforPolyps

    0.15, // regression on TotalFrailtyForPolyps

    0.1}, // SD of log mutation rate

    domega[4] = {0.05, 0.01, 0.01, 0.02},

    lnAvgSizeAtCaDx0= 10, // ln average size of polyp at cancer diagnosis

    dlnAvgSizeAtCaDx= 0.01,

    SDlnSize0 = 0.5, // SD ln size of polyp at cancer diagnosis

    dSDlnSize = 0.01,

    FNwindow = 10; // years following screen used to defing a "false negative"

    double lambda[4],sigma[11],pi[10],mu[4],nu[4],psi[2],omega[4],lnAvgSizeAtCaDx,SDlnSize,coeff[4][5];

    int a, // index for age = 1,…,100

    f, // index for families = 1,…,F

    m, // index for sibs within families = 1,…,M

    p, // index for polyps for each case = 1,…,P

    s, // index for screening times = 1,…,S

    v,v1,v2, // index for covariates (v1,v2 for pairs of covariates in information matrix)

    repl; // index for replicates = 1,…,NumRepl

    int Ndetected[F][M][S];

    double SharedFrailty[F][V],CorrelatedFrailty[F][M],IndependentFrailty[F][M][V],TotalFrailtyForPolyps[F][M],TotalFrailtyForScreening[F][M];

    double SharedX[F][V],CorrelatedX[F][M],IndependentX[F][M][V],TotalXforPolyps[F][M],TotalXforScreening[F][M];

    double AgeOfPolyp[F][M][P],PolypGrowthRate[F][M][P],AgeOfMutation[F][M][P];

    double AgeAtScreen[F][M][S];

    double AgeAtCancer[F][M],AgeAtCancerWithoutScreening[F][M],AgeOfTumor[F][M][P],ProgramAgeAtCancer[F][M];

    double AgeAtCensoring[F][M];

    double CancerGrowthRate[F][M][P];

  • int ScreenCompliant[F][M][S];

    int NumScreens[F][M],NumScreensCompleted[F][M],NumUncensoredScreens[F][M],Cancer[F][M];

    int ProgramNumScreens[F][M];

    int PolypDetected[F][M][P],ProgramPolypDetected[F][M][P],ScreenPositive[F][M][S];

    double AgePolypDetected[F][M][P],ProgramAgePolypDetected[F][M][P];

    double MeanAgeOfPolyps,VarAgeOfPolyps,MeanUncensoredNPolyps,

    MeanTimeToMutation,VarTimeToMutation,MeanUncensoredNMutations,TotalNumMutations,

    MeanTimeToCancer,VarTimeToCancer,

    NumScreened[S],NPolypsScreened[S],MeanAgeAtScreen[S],MeanNCompliant[S];

    int NumPolypsByAge[100],NumMutationsByAge[100];

    int IncidentCancers[100],NumUncensored[100],UncensoredCancers[100];

    int iter,output,rep,pp;

    double MeanGrowthRate[S],MeanPdetectPolyp[S],MeanPolypSizeAtScreening[S],MeanTimeFromPolypToDetection[S],MeanPolypSizeDetected[S],TotalPolypsDetected[S];

    int pair,Npair,CaseFamily[MaxPair],CaseMember[MaxPair],ControlFamily[MaxPair],ControlMember[MaxPair];

    double PairAge[MaxPair],DiseaseCovar[MaxPair][2][MaxDiseaseCovar],beta[NumDiseaseCovar],SEbeta[NumDiseaseCovar];

    double ScreeningCovar[MaxScreeningCovar];

    double ScreeningBaselineHazard[100],CumulativeBaselineHazard[100],num[100],den[100];

    double PropensityScore[F][M][100],MeanPropensityScore[100];

    double MeanNdetected[100],MeanSibsScreened[100],MeanSibsPolyps[100],MeanSibsCancer[100],Nmeans[100];

    double PredNdetected[100],PredSibsScreened[100],PredSibsPolyps[100],PredSibsCancer[100];

    int nn,NNtot,NN[F][M][100],ScreenedAtAge[MaxNN];

    double ZZ[MaxNN][NumVar],Alpha1[NumVar],Alpha2[NumVar],SEalpha1[NumVar],SEalpha2[NumVar];

    int MinTimePrediagnosis,MaxTimePrediagnosis,minTime,maxTime,policy,PolypsFound,model;

    double ProgramAgeAtScreen[F][M][S];

    double SummaryPolicyMeasure[5][4][NumPolicy][NumMeasure];

    double RootMeanPolypSize;

    double CurrentAge;

    double PreclinicallyDetectableAge[F][M][P],

    NumPreclinicalTumors,NumPreclinicallyDetectableTumors,

    MeanPreclinicallyDetectableTime,VarPreclinicallyDetectableTime;

    int AA;

    double Y10[MaxAA],Y5[MaxAA],Z10[MaxAA][12],Weight[MaxAA],

    gamma[NumModel][NumPolicy][12],SEgamma[NumModel][NumPolicy][12],PopulationAverageP10;

    int NumFirstScreenTargetAge[100],NumTimeBetweenScreens[2][200];

    int CounterfactualClinicalCancer[F][M][NumPolicy],

    CounterfactualScreenedCancer[F][M][NumPolicy],

    CounterfactualPolyps[F][M][NumPolicy],

    CounterfactualPositiveScreens[F][M][NumPolicy];

    double AverageRisk[100][2];

    char FileName[14];

    FILE *sum,*sim,*chk;

    // * * * * * * * * * * * * * * * * * * * *

    // S I M U L A T I O N R O U T I N E S

    // * * * * * * * * * * * * * * * * * * * *

    void GetScreeningCovariates (int ff, int mm, double age)

    { ScreeningCovar[0] = TotalXforPolyps[ff][mm];

    ScreeningCovar[1] = TotalXforScreening[ff][mm];

    ScreeningCovar[4] = 0; // number of previous screens

    ScreeningCovar[5] = 0; // number of positive screens

    double AgeAtLastScreen=0;

    for (int ss=0; ss

  • void GetDiseaseCovariates(int pp, int cc, int ff, int mm, double age)

    {

    DiseaseCovar[pp][cc][0] = TotalXforPolyps[ff][mm];

    DiseaseCovar[pp][cc][1] = TotalXforScreening[ff][mm];

    DiseaseCovar[pp][cc][2] = 0;

    DiseaseCovar[pp][cc][3] = 0;

    for (int ss=0; ss PairAge[pp]-MaxTimePrediagnosis)

    { if (ScreenCompliant[ff][mm][ss]) DiseaseCovar[pp][cc][2] ++;

    if (Ndetected[ff][mm][ss]) DiseaseCovar[pp][cc][3] ++;

    }

    if (ScreeningVariable==1 && DiseaseCovar[pp][cc][2])

    DiseaseCovar[pp][cc][2]=1; // binary version of ever screened during interval before reference date

    if (NumDiseaseCovar>4)

    { GetScreeningCovariates(ff,mm,age);

    for (v=0; v

  • if (a>97) CenterMean[v] = LastCenterMean[v];

    LastCenterMean[v] = CenterMean[v];

    }

    MeanNdetected[a] = CenterMean[0];

    MeanSibsScreened[a] = CenterMean[1];

    MeanSibsPolyps[a] = CenterMean[2];

    MeanSibsCancer[a] = CenterMean[3];

    if (MeanSibsCancer[a]

  • VarAgeOfPolyps += pow(AgeOfPolyp[f][m][p],2);

    if (AgeOfPolyp[f][m][p]+0.746) XdiseaseGroup=2;

    MeanTimeToMutation += TimeToMutation;

    VarTimeToMutation += pow(TimeToMutation,2);

    MeanTimeToCancer += TimeToCancer;

    VarTimeToCancer += pow(TimeToCancer,2);

    if (AgeOfTumor[f][m][p]

  • CurrentAge=999; NextSib=-9;

    for (m=0; m0.01 && error==0)

    { memset(U,0,sizeof(U));

    memset(Info,0,sizeof(Info));

    for (int aa=0; aa

  • + gamma[4][policy][ 2]*gFH

    + gamma[4][policy][ 3]*gX*gFH

    + gamma[4][policy][ 4]*logage50

    + gamma[4][policy][ 5]*logage50*gX

    + gamma[4][policy][ 6]*logage50*gFH

    + gamma[4][policy][ 7]*logage50*gX*gFH

    + gamma[4][policy][ 8]*logage50*logage50

    + gamma[4][policy][ 9]*logage50*logage50*gX

    + gamma[4][policy][10]*logage50*logage50*gFH

    + gamma[4][policy][11]*logage50*logage50*gX*gFH;

    double P5 = exp(logitP5); P5 /= 1+P5;

    TTAR = 5*AverageRisk/P5;

    }

    else

    { double logitP10 = gamma[2][policy][ 0]

    + gamma[2][policy][ 1]*gX

    + gamma[2][policy][ 2]*gFH

    + gamma[2][policy][ 3]*gX*gFH

    + gamma[2][policy][ 4]*logage50

    + gamma[2][policy][ 5]*logage50*gX

    + gamma[2][policy][ 6]*logage50*gFH

    + gamma[2][policy][ 7]*logage50*gX*gFH

    + gamma[2][policy][ 8]*logage50*logage50

    + gamma[2][policy][ 9]*logage50*logage50*gX

    + gamma[2][policy][10]*logage50*logage50*gFH

    + gamma[2][policy][11]*logage50*logage50*gX*gFH;

    double P10 = exp(logitP10); P10 /= 1+P10;

    TTAR = 10*AverageRisk/P10;

    }

    if (TTAR

  • fprintf (sum,"\n\nSUMMARY STATISTICS BY SCREEN NUMBER");

    fprintf (sum,"\n s Num subj Mean prop N compliant Polyp N Polyps Prob polyps Mean polyp Num polyps Time from Mean polyp ");

    fprintf (sum,"\n screened age compliant (cumulative) growth screened detected size at detected polyp to size at ");

    fprintf (sum,"\n rate screening per compliant detection detection");

    int CumNCompliant=0;

    for (s=0; s-0.746) XscreeningGroup=1;

    if (TotalXforScreening[f][m]>+0.746) XscreeningGroup=2;

    NXdisease[XdiseaseGroup] ++;

    NXscreening[XscreeningGroup] ++;

    for (a=0; a

  • }

    fprintf (sum, "\n");

    for (a=0; a

  • if (TotalXforPolyps[f][m]>+0.746) gX=2;

    GetScreeningCovariates(f,m,double(a));

    int gFH=0; if (ScreeningCovar[8]) gFH=1;

    double py = AgeAtCensoring[f][m]-a;

    double pyFUpos = AgeAtCensoring[f][m]-a;

    if (AgeAtCancer[f][m] < AgeAtCensoring[f][m]) py = AgeAtCancer[f][m]-a;

    if (py>10) py=10;

    totalPY[gX][gFH][a] += py;

    if (AgeAtCancer[f][m] < a+10 &&

    AgeAtCancer[f][m] < AgeAtCensoring[f][m])

    { totalCases[gX][gFH][a] ++;

    if (a>=20 && a5) pyFUpos=5;

    totalFUposPY[gX][gFH][a] += pyFUpos;

    if (AgeAtCancer[f][m] < a+5 &&

    AgeAtCancer[f][m] < AgeAtCensoring[f][m])

    totalFUposCases[gX][gFH][a] ++;

    }

    s=0;

    int LastScreen=-1;

    while (AgeAtScreen[f][m][s]

  • NcasesByNumScreens[gX]/PYbyNumScreens[gX],NcasesByNumScreens[gX],PYbyNumScreens[gX]);

    fprintf (sum, "\nCancer SIR (O/E) ");

    for (gX=0; gX

  • if (covar==1) fprintf (sum,"\n\nNumber of sibs' positive screens ");

    if (covar==2) fprintf (sum,"\n\nNumber of sibs with cancer ");

    if (covar==3) fprintf (sum,"\n\nSummary sib risk index ");

    for (gX=0; gX

  • }

    printf ( "\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);

    fprintf (sum,"\n\nLogistic regression predicted probabilities of 10-yr cancer risks under policy %d",policy);

    fprintf (sum,"\nage X0 FH=0 FH>0 FH=0 FH>0 average");

    for (a=20; a

  • for (a=0; a

  • lnL += log(RRcase / (RRcase/PScase + RRctl/PSctl) );

    for (int v1=0; v1

  • void LogisticRegression2()

    { double U[NumVar],Info[NumVar][NumVar],InvInfo[NumVar][NumVar];

    // fprintf (sum,"\n\n\nIterations for conditional propensity model for screening");

    double chisq2=999,Wald2;

    memset (Alpha2,0,sizeof(Alpha2));

    for (v=0; v0.01 && error==0)

    { memset(U,0,sizeof(U));

    memset(Info,0,sizeof(Info));

    for (nn=0; nn

  • if (v==0) fprintf (sum," intercept");

    if (v==11 || v==12) fprintf (chk," %10.6f %8.6f ",Alpha2[v],SEalpha2[v]);

    if (v==1) fprintf (sum," (age-70)");

    if (v==2) fprintf (sum," (age-70)^2");

    if (v==3) fprintf (sum," (age-70)^3");

    if (v==4) fprintf (sum," X for disease");

    if (v==5) fprintf (sum," X for screening");

    if (v==6) fprintf (sum," First screen indicator");

    if (v==7) fprintf (sum," Time since last screen");

    if (v==8) fprintf (sum," Num previous screens");

    if (v==9) fprintf (sum," Num positive screens");

    if (v==10) fprintf (sum," Sibs previously screened");

    if (v==11) fprintf (sum," Sibs previously positive");

    if (v==12) fprintf (sum," Sibs with cancer");

    if (v==13) fprintf (sum," Summary sib risk index");

    }

    fprintf (sum,"\n\nMean (SD) log Propensity Scores = %6.3f (%5.3f)\n",

    MeanLnPropensityScore/(F*M),

    sqrt((VarLnPropensityScore-pow(MeanLnPropensityScore,2)/(F*M))/(F*M-1)));

    }

    // * * * * * * * * * * * * * * * * * * * * * * * * * * * *

    // P R O J E C T E D O U T C O M E S O F S C R E E N I N G P O L I C I E S

    // * * * * * * * * * * * * * * * * * * * * * * * * * * * *

    double TargetAgeAtFirstScreen(int policy, double age)

    { double gX = TotalXforPolyps[f][m];

    GetScreeningCovariates(f,m,age);

    int gFH=0; if (ScreeningCovar[8]) gFH=1;

    double logitAvgP10 = log(AverageRisk[50][0] / (1 - AverageRisk[50][0]));

    double A = gamma[2][policy][8] + gamma[2][policy][9]*gX + gamma[2][policy][10]*gFH + gamma[2][policy][11]*gX*gFH;

    double B = gamma[2][policy][4] + gamma[2][policy][5]*gX + gamma[2][policy][ 6]*gFH + gamma[2][policy][ 7]*gX*gFH;

    double C = gamma[2][policy][0] + gamma[2][policy][1]*gX + gamma[2][policy][ 2]*gFH + gamma[2][policy][ 3]*gX*gFH

    - gamma[1][4][0];

    double dt = (-B + sqrt(B*B-4*A*C))/(2*A);

    double t = 50*exp(dt);

    double logage50 = log(t/50);

    if (t80) t=81;

    if (policy==4) NumFirstScreenTargetAge[int(t)] ++;

    // chect that predicted risk at this age equals population average

    double logitP10 = gamma[2][policy][ 0]

    + gamma[2][policy][ 1]*gX

    + gamma[2][policy][ 2]*gFH

    + gamma[2][policy][ 3]*gX*gFH

    + gamma[2][policy][ 4]*logage50

    + gamma[2][policy][ 5]*logage50*gX

    + gamma[2][policy][ 6]*logage50*gFH

    + gamma[2][policy][ 7]*logage50*gX*gFH

    + gamma[2][policy][ 8]*logage50*logage50

    + gamma[2][policy][ 9]*logage50*logage50*gX

    + gamma[2][policy][10]*logage50*logage50*gFH

    + gamma[2][policy][11]*logage50*logage50*gX*gFH;

    double P10 = exp(logitP10); P10 /= 1+P10;

    return(t);

    }

    double TimeToAverageRisk(int LastPositive, int policy, double AverageRisk)

    { double TTAR=5;

    double gX = TotalXforPolyps[f][m];

    double t=ProgramAgeAtScreen[f][m][s-1];

    double logage50 = log(t/50);

    GetScreeningCovariates(f,m,t);

    int gFH=0; if (ScreeningCovar[8]) gFH=1;

    if (LastPositive)

    { double logitP5 = gamma[4][policy][ 0]

    + gamma[4][policy][ 1]*gX

    + gamma[4][policy][ 2]*gFH

    + gamma[4][policy][ 3]*gX*gFH

    + gamma[4][policy][ 4]*logage50

    + gamma[4][policy][ 5]*logage50*gX

    + gamma[4][policy][ 6]*logage50*gFH

    + gamma[4][policy][ 7]*logage50*gX*gFH

    + gamma[4][policy][ 8]*logage50*logage50

    + gamma[4][policy][ 9]*logage50*logage50*gX

    + gamma[4][policy][10]*logage50*logage50*gFH

    + gamma[4][policy][11]*logage50*logage50*gX*gFH;

    double P5 = exp(logitP5); P5 /= 1+P5;

    TTAR = 5*AverageRisk/P5;

    if (TTAR>4.9 && TTAR

  • TTAR = 10*AverageRisk/P10;

    }

    if (TTAR=20 && ProgramAgeAtScreen[f][m][s-1]

  • for (s=0; s

  • // number of tumors prevented (i.e., polyps screen detected)

    // out of all possibly preventable (i.e.,

    if (AgeOfTumor[f][m][p] < AgeAtCensoring[f][m])

    { NumTumorsPreventable ++;

    if (ProgramAgePolypDetected[f][m][p] < AgeOfTumor[f][m][p])

    NumTumorsPrevented ++;

    }

    }

    if (TotalPolypsThisScreen)

    { PositiveScreens ++;

    CounterfactualPositiveScreens[f][m][policy] ++;

    }

    TotalPolyps += TotalPolypsThisScreen;

    CounterfactualPolyps[f][m][policy] += TotalPolypsThisScreen;

    }

    }

    SummaryPolicyMeasure[minTime][maxTime][policy][ 0] = TotalScreens;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 1] = PositiveScreens;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 2] = TotalPolyps;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 3] = TotalCancers;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 4] = NumFN;

    // SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumCancersWithin10Years;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 5] = NumScreenNeg;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 6] = NumFP;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 7] = NumPolypsNeverCancer;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 8] = NumTumorsPrevented;

    SummaryPolicyMeasure[minTime][maxTime][policy][ 9] = NumTumorsPreventable;

    SummaryPolicyMeasure[minTime][maxTime][policy][10] = NumCancersScreenDetected;

    SummaryPolicyMeasure[minTime][maxTime][policy][11] = NumPolypsScreened;

    }

    void Counterfactuals()

    { int CounterfactualNumClinicalCancers[NumPolicy][NumPolicy][4],

    CounterfactualNumScreenedCancers[NumPolicy][NumPolicy][4],

    CounterfactualNumPolyps[NumPolicy][NumPolicy][4],

    CounterfactualNumPositiveScreens[NumPolicy][NumPolicy][4];

    memset (CounterfactualNumClinicalCancers,0,sizeof(CounterfactualNumClinicalCancers));

    memset (CounterfactualNumScreenedCancers,0,sizeof(CounterfactualNumScreenedCancers));

    memset (CounterfactualNumPolyps,0,sizeof(CounterfactualNumPolyps));

    memset (CounterfactualNumPositiveScreens,0,sizeof(CounterfactualNumPositiveScreens));

    int policy1,policy2;

    for (policy1=1; policy1 CounterfactualPositiveScreens[f][m][policy2])

    CounterfactualNumPositiveScreens[policy1][policy2][0] ++;

    else if (CounterfactualPositiveScreens[f][m][policy1] < CounterfactualPositiveScreens[f][m][policy2])

    CounterfactualNumPositiveScreens[policy1][policy2][2] ++;

    else if (!CounterfactualPositiveScreens[f][m][policy1] && !CounterfactualPositiveScreens[f][m][policy2])

    CounterfactualNumPositiveScreens[policy1][policy2][1] ++;

    else CounterfactualNumPositiveScreens[policy1][policy2][3] ++;

    }

    }

    fprintf (sum,"\n\nSummary of counterfactual outcomes for time window %d-%d",MinTimePrediagnosis,MaxTimePrediagnosis);

    fprintf (sum,"\n\nCounterfactual clinical cancers:");

    for (policy1=1; policy1

  • fprintf (sum,"\n\nCounterfactual total number of polyps detected:");

    for (policy1=1; policy1

  • if (measure== 9) fprintf (sum," tumors preventable");

    if (measure==10) fprintf (sum," total cancers screen detected");

    if (measure==11) fprintf (sum," total polyps screened (detected or not)");

    if (measure==12) fprintf (sum," polyps detected");

    if (measure==1)

    { fprintf (sum,"\n%2d/%2d ",1,0);

    for (policy=0; policy

  • fprintf (sum,"\n\nF=%d, M=%d, P=%d",F,M,P);

    fprintf (sum,"\nlambda "); for (v=0; v