bloodcancerdiscov.aacrjournals.org€¦ · Web viewThree variants were found in genes associated with Noonan syndrome (OMIM #616564, #610733, and #616559) or Noonan-like syndrome

Supplementary Note

Rise in mutation burden with disease progression

All somatic sequence variants detected in the exomes of diagnosis and relapse samples

(N=16,615) were subjected to confirmatory capture-based sequencing to provide a more

accurate estimate of mutant allele frequency (MAF) and facilitate detection of subclonal

variants. Overall, there was excellent concordance in mutation calling between discovery and

verification sequencing (89.7% of mutations), with discordance in defining subclonal (MAF

<30%), heterozygous (MAF, 30%~75%) and homozygous (MAF >75%) status in 2% of the

mutations (Supplementary Figure 1a-c). Importantly, 3.4% of the mutations initially

considered relapse-specific from analysis of WES/WGS data were shown to be present at

diagnosis, including NOTCH1, NRAS, KRAS, TP53, and CREBBP (Supplementary Figure1d). In line, our deep sequencing efforts revealed tumor specific mutations in the complete

remission samples (Supplementary Table 3). A total of 936 tumor specific mutations (6.7%

of the mutations) were detected in the complete remission samples of 72 patients (median of

4 mutations per sample, range 1-195 mutations). The median MAF of tumor specific

mutations in the complete remission samples was 1.43% (range 0.1% - 29.8%), which

increased to 43.2% (range 1.2% - 99.5%) in the corresponding tumor samples. The median

increase in MAF from complete remission to tumor sample was 95.4% (range 19.5% -

99.9%). This indicates various degrees of discernable tumor contamination in the complete

remission samples. To test whether the residual of somatic mutations in complete remission

is an indicator of relapse, we identified 12 diagnosis-relapse cases (drawn from this study)

and 20 ALL cases that did not relapse with remission samples obtained between 27-49 days

(median 44 days) after diagnosis. High depth exome sequencing was performed for all

germline samples (depth 377-515X, median 423X) and the somatic mutations identified in

tumor samples were examined in the remission samples (Supplementary Table 2). In

contrast to prior data in AML, there was no association between the presence of residual

somatic mutations in the germline samples and the risk of relapse (5 out of 12 relapses ALL

cases have residual somatic mutations in remission samples compared to 12 out of 20 non-

relapse ALL cases; Fisher’s exact test, P=0.467).

The most frequent mutation types identified in the cohort were missense mutations

(coding, silent or located in the UTRs) and single copy number deletions (SupplementaryFigure 2). The burden of mutations increased with disease progress from a median of 23

(range 1-164) single nucleotide variants (SNVs) and short insertions/deletions (indels) at

diagnosis to 41 (range 4-1,139) at relapse and 117 (range 26-1,699) at second relapse

(Supplementary Figure 3). This increase was observed for coding and non-coding variants,

and for clonal and subclonal variants, suggesting ongoing mutational processes. No

1

differences in mutation burden were observed according to disease lineage, sex, outcome

(cure or death due to disease), or by age at diagnosis (with the exception of cases of

infant/KMT2A-rearranged leukemia that have a low mutation burden; Supplementary Figure3 and Supplementary Tables Supplementary Table 3-Supplementary Table 5).

A modest increase in the number of structural genetic alterations was also observed,

with a median of 9, 10 and 23 DNA copy number alterations (CNAs) at diagnosis, first and

second relapse respectively (Supplementary Figure 3, and Supplementary Tables Supplementary Table 6-Supplementary Table 7). In exceptional cases, mutations

compromising DNA repair were associated with genomic instability. For example, case

SJBALL022431 had somatic bi-allelic alterations of ATM from deletion of 11q and a

nonsense mutation, and over 100 small losses and gains involving chromosomes 11, 15, and

21 reminiscent of a chromothripsis event (D: N=109 affecting 368.5 Mb and R: N=113

affecting 467.8 Mb; Supplementary Table 7). This is in contrast to other cases with relatively

high CNA burden but without DNA repair defects (e.g., SJETV043 and SJTALL023 with over

50 CNAs per sample), which contained predominantly small deletions but no amplifications

(Supplementary Figure 2b).

Early relapsesNext to CREBBP mutations, we found that mutations in ERG (N=3) and ARID2 (N=4)

associate with relatively late relapses (mean 7.0 vs 2.9 yrs, P=0.011; and mean 6.6 vs 2.9

yrs, P=0.013, respectively). Mutations in ERG and ARID2 occurred exclusively in leukemias

of the DUX4 subtype. Several genes were correlated to early relapse, but the number of

mutations was low (Supplementary Table 9). We identified 8 cases who relapsed

disproportionately early (>1.5 inter-quartile range) within their ALL subtype. Aberrations that

were present exclusively in this group were i(9)(q10) with homozygous PAX5 P80R mutation

(SJBALL192) and NR3C1 frameshift mutations (SJETV010 and SJBALL199). Other relapse

associated aberrations in this group were (combinations of) hypermutation (SJBALL022428

and SJETV010), SNV/indels in NT5C2 (SJBALL192 and SJHYPER098), SNV/indels in TP53

(SJBALL022428 and SJETV010) and SNV/indels and CNAs in IKZF1 (SJBALL0212924

(p.G158S and Del7-8), SJHYPER098 (del1-4), SJPHALL020 (del1-8), and SJTALL093

(p.E76*)), CNAs in TBL1XR1 (SJETV010), CNAs in ETV6 (SJBALL199 and SJETV010) and

homozygous loss of CDKN2A/B (SJBALL192, SJETV010, SJHYPER098, and SJTALL093).

These data indicate that time to relapse is influenced by the combined action of unique

aberrations within ALL subtypes.

2

Frequently mutated genes and pathways

A total of 4,509 genes harbored non-silent SNV/indels mutations, of which 1,063 were

mutated in at least two cases (Supplementary Table 3). Two-hundred and twenty-one were

known targets of mutation in cancer according to the COSMIC Cancer Gene Census databse

(1) or leukemia studies (Supplementary Figure 7a and Supplementary Table 16). Among

the 293 genes recurrently (n≥2) and exclusively targeted in the major clones at relapse (i.e.

not targeted at diagnosis) by acquired SNV/indels or CNAs (Supplementary Table 16), 82

were targeted by non-silent variants, of which 6 were CGC genes (reported in the COSMIC

Cancer Gene Census or in recent leukemia studies; NCOR2, NT5C2, FOXA1, MED12,

PABC1, and PPMID)(2). Twelve genes harbored recurrent clonal and acquired non-silent

mutations uniquely at second relapse, including AASDH, AGBL3, ARGHAP29, ATP11C,

CSF3R, DCK, LRP6, SCRIB, SDK2, TENM3, TNIP1, and ZNF436 (Supplementary Table17), suggesting a possible role in treatment resistance or evasion.

In general, B and T-lineage ALL show distinct genetic alteration profiles, e.g. genes

involved in B cell development are enriched in B-ALL (Fisher’s exact test, P=0.002), while

mutations in transcription factor NOTCH1 involved in maturation of both CD4+ and CD8+

cells in the thymus were exclusively observed in T-ALL (Supplementary Figure 6). Fifteen

genes were associated with disease lineage, e.g. VPREB1 and CREBBP with B-ALL, and

NOTCH1, WT1, PTEN, STIL, and NT5C2 with T-ALL (Fisher’ exact p<0.05). The most

common alteration perturbing B cell development involved the lymphoid transcription factor

gene IKZF1, which is commonly mutated in BCR-ABL1(3), Ph-like(4) and DUX4-rearranged

ALL (5). Consistent with the known association of IKZF1 alterations with high-risk B-ALL (3),

(4) IKZF1 alterations were strongly enriched at relapse, with preservation (N=21) or

acquisition (N=5) in 26 cases (Supplementary Table 15). Subtype-associated gene

mutations were also observed such as CREBBP in hyperdiploid (9 of 14) and low hypodiploid

(2 of 3) ALL; ARID2 mutations in DUX4-rearranged (4 of 7) and ARID5B in hyperdiploid (4 of

14) ALL.

Overall 76% B-ALL and 80% T-ALL harbored at least one signaling pathway mutation

(median 3, range 1-8). The most frequently were the Ras (B-ALL 47.8%, T-ALL 36%), JAK-

STAT (B-ALL 19.4%, T-ALL 28%) and PI3K-AKT (B-ALL 13.4%, T-ALL 28%; Figure 1 and Supplementary Figure 5a). Ras signaling pathway mutations were observed in 41 cases,

and 26 (63.4%) of them were preserved from diagnosis to relapse, with 24.4% (N=10)

acquired and 12.2% (N=5) lost in disease progression. The data also provide new insight by

showing common multiclonality of Ras pathway mutations at diagnosis and frequent

convergence to a dominant mutation at relapse. Of 61 cases with signaling pathway

mutations, 31 harbored at least one Ras pathway mutation at diagnosis, with 11 cases

3

having multiple, commonly subclonal Ras mutations (Supplementary Table 14). Seven

cases showed convergence to a single or two clonal Ras pathway mutations. For the JAK-

STAT pathway, similar enrichment in preserved mutations was observed (N=10, 50.0%

preserved; N=5, 25.0% acquired; and N=5, 25.0% lost). However, most of the PI3K-AKT

pathway mutations in B-ALL were acquired in relapse (8 out of 10), whereas in T-ALL, most

of the mutations identified in diagnosis were lost at relapse (11 out of 13; Supplementary Tables Supplementary Table 13-Supplementary Table 14).

Genes encoding regulators of cell cycle and tumor suppression were commonly

mutated in both B-ALL (N=42, 62.7%) and T-ALL (N=21, 84%), with gene-specific patterns of

enrichment. Ten TP53 alterations (8 cases) were preserved from diagnosis or acquired as

new alterations at relapse. In contrast, CDKN2A mutations were lost (N=6), preserved

(N=34) or acquired (N=15) in disease progression. Epigenetic modifier/regulator mutations

were observed in 67.2% of B-ALL (N=45) and 72% of T-ALL (N=18) (Supplementary Figure5b), most of which were acquired (N=85, 56.7%) or preserved (N=48, 32.0%) from diagnosis

to relapse. CREBBP mutations (22 mutations in 17 B-ALL and 1 T-ALL case) showed

preservation or acquisition in all but one case.

Second primary leukemia

To verify clonal relatedness in a subset of patients, we examined IGH (11 B-ALL patients) or

TCRB (3 T-ALL patients) gene rearrangements by targeted next generation sequencing,

which revealed that the relapses were clonally related to major (N=5) or minor (down to

0.005%; N=7) clones at diagnosis (Supplementary Table 19). Three cases (SJBALL006,

SJTALL049 and SJTALL142) revealed fully discordant tumors for all genetic alterations (SV,

CNA, SNV, Indel and IgH/TCRβ rearrangements), raising the possibility of a distinct second

primary leukemia. SJBALL006 is described in the main text. This patient exhibited intellectual

impairment, short stature, and thrombocytopenia. SJTALL049 is a boy and described in the

main text. He suffered a grand mal seizure at age 4 and has a mild nystagmus. Analysis of

the germline WES data revealed two indels on one allele in TSC2, leading to an in-frame

deletion of 3 amino acids. Mutations in TSC2 are associated with tuberous sclerosis (OMIM

#613254) and can cause brain and kidney tumors, but leukemia has not been reported.

SJTALL142 developed T-ALL at 15 years of age and KMT2A-MLLT10 rearranged MDS at

18. Additional WGS analysis revealed no shared SV or SNVs (Supplementary Table 20).

The boy had neck surgery at 6 months of age due to cysts, and he has a history of renal

insufficiency. The family history was inconclusive for cancer occurrence and the boy did not

harbor any constitutional variants in a gene on our gene list.

Our cohort contained five patients who developed a myeloid tumor after T-ALL. Two

of these tumors were identified as clonally unrelated second primary leukemias. However, 4

three AMLs were clonally related relapses originating from a minor clone (SJTALL124: T-

ALL; 20 mutations shared, e.g. TP53 R273H and CDKN2A deletion), a major clone

(SJTALL164: ETP-ALL; 20 mutations shared, e.g. NF1 deletion), or polyclonal (SJTALL008:

ETP-ALL; 20 mutations shared, e.g. NRAS Q61H), indicating that lymphoid and myeloid

tumors can share a common origin. This is in line with lineage plasticity and multipotent

common ancestor cells identified in mixed phenotype acute leukemia (6) and histiocytic

tumors (7).

Constitutional variants and ALL susceptibility

We analyzed our complete cohort for constitutional variants in genes known to be associated

with leukemia prone syndromes or with leukemia development (Supplementary Tables Supplementary Table 31 and Supplementary Table 32). We found two truncating variants

in TP53 (Y163Tfs*7 and R196*), which are causative for Li Fraumeni syndrome in patients

SJHYPO126 and SJTALL164, respectively. The tumors of both patients showed loss of the

wild type allele through somatic deletion of the short arm of chromosome 17. SJHYPO126

developed hypodiploid leukemia and has an extensive family history for cancer (breast

cancer, liposarcoma, lung cancer, colon cancer and leukemia). SJTALL164 developed ETP-

ALL with a clonally related but phenotypically myeloid relapse. In addition to the truncating

TP53 variant, SJTALL164 harbored constitutional truncating variants in NOTCH1 (Q2440*)

and in DNM2 (T78Nfs*18), both genes that are involved in leukemia development (8,9).

Clinical data was not available for this patient.

In patient SJBALL022480 we found a constitutional truncating mutation in PRDM1

(R25Sfs*13), which is a DNA-binding transcriptional repressor involved in B-cell

differentiation. The probability of intolerance for this gene is very high (pLI=0.98), suggesting

a severe effect of functional loss. Finally, SJBALL013 harbored a truncating mutation

(I1085Hfs*60) in PTCH1, which is associated with basal cell nevus syndrome or Gorlin

syndrome (OMIM #109400). The patient presented with low set ears, macrocephaly and

intellectual disability. In addition, the boy developed a basal cell carcinoma at age 5,

concurring the diagnosis of Gorlin syndrome.

We identified 13 heterozygous variants (13 patients) in genes associated with

autosomal recessively inherited syndromes (e.g. Fanconi Anemia, CMMRD). Likely, these

latter variants are not directly causative for the leukemia in these patients. None of the

variants, except for PMS (G108R) in SJHYPER127, obtained a second hit in the tumors of

the patients. We did not identify homozygous or compound heterozygous variants

(Supplementary Table 32). In SJHYPER021925, we found a missense variant in PHF6

(R163H), a gene associated to the X-linked recessive Borjeson-Forssman-Lehman

syndrome (OMIM #301900) and somatically affected in ALL (10,11). The patient was female 5

with no phenotypical features of Borjeson-Forssman-Lehman syndrome reported. A large

cohort study is needed to reveal whether heterozygous PHF6 variants in females lead to a

susceptibility to leukemia development.

We identified 12 variants of unknown significance (in 11 patients) in genes associated

with autosomal dominantly inherited cancer prone disorders. Three variants were found in

genes associated with Noonan syndrome (OMIM #616564, #610733, and #616559) or

Noonan-like syndrome (OMIM #613563). LZTR1 G711R (SJHYPO126) is located in the

BTB/POZ domain; SOS1 N622S (SJHYPER009) is located in the RasGEFN domain, and

SOS2 E422K (SJTALL098) and CBL P684S (SJERG023) were not located in a defined

domain. None of the genes harbored second hits in the tumors of the patients. The

phenotype of SJHYPO126 and SJHYPER009 did not resemble Noonan syndrome.

SJHYPER009 was diagnosed with Duane syndrome (OMIM #126800) and developmental

delay with speech deficits. SJERG023 presented with skin moles on the back, but no

Noonan-like syndrome phenotype. Phenotypic data was not available for SJTALL098.

We included SAMD9 and SAMD9L in our gene list because of recent reports

associating missense mutations in these genes to MIRAGE syndrome (OMIM #617053) (12),

myelodysplasia and leukemia syndrome with monosomy 7 (MLSM7; OMIM #252270) (13)

and ataxia-pancytopenia syndrome (ATXPC; OMIM #159550) (14). We identified variants in

SAMD9 in the germlines of SJBALL022479 (I1549Sfs*10) and SJPHALL029 (C283Y). The

frameshift variant is not likely to be associated with leukemia development in this patient as

SAMD9 is completely tolerant to loss of function mutations (ExAC pLI=0). For both these

patients’ phenotypic data was not available.

Finally, we found variants of unknown significance in the genes PAX5 (associated

with leukemia susceptibility, OMIM #615545), TYK2 (associated with leukemia susceptibility

(15)), NF1 (associated with neurofibromatosis, OMIM #16220), and CDH1 (associated with

familial gastric cancer, OMIM #137215). We identified a 7 amino acid in-frame deletion

P51del7aa in PAX5 in SJHYPER021999. The variant G183S in the octapeptide domain of

PAX5 was associated with susceptibility to ALL development in three families (16,17), but

further studies should reveal whether a deletion of 7 amino acids in the paired box domain of

this gene can drive leukemia development. No second hits in this gene were found in the

tumor of the patient. We identified a P120L variant in TYK2 in patient SJTALL055. Variants in

the DPG motif of TYK2 have been associated with leukemia susceptibility (15), but no data is

available on whether TYK2 is functionally affected by the P120L variant in the FERM domain.

There was no second hit in the tumor, and phenotypical data for this patient was not

available. The NF1 variant K1385E is located in the RasGAP Neurofibromin domain and was

found in SJTALL092. For this patient no phenotype was reported, but the maternal branch of

the family did harbor some cases of cancer (lymphoma in a maternal uncle, breast cancer in 6

a maternal aunt and colorectal cancer in the maternal grandfather). A second hit in this gene

was not present in the tumors of this patient, which suggests that this variant is not causative

for the leukemia in SJTALL092. SJPHALL018 carried a P373L variant in CDH1, a gene

associated with familial gastric cancer (18), though no cancer was reported in the family. A

link with leukemia predisposition has not yet been reported for CDH1.

Hypermutation and MMR mutations

Patients with hypermutated relapses with the MMR associated mutational profile (group 3) all

harbored biallelic mutations in one of the MMR genes. These included a germline PMS2

G108R mutation and a somatic deletion of PMS2 with retention of the mutant allele in

SJHYPER127-R1; somatic homozygous deletion of MSH2 and MSH6 in SJTALL023-R1;

somatic deletion and homozygous R680* mutation of MSH2 plus a hemizygous deletion of

MSH6 in SJTALL023-R2; and somatic homozygous mutation in MSH2 (R340Nfs*17) in

SJTALL057-R1. In group 1, SJETV10 harbored biallelic copy number deletions in MLH1 in

both relapses, while an acquired subclonal missense mutation in MSH6 was identified in the

relapse of SJHYPER022. SJHYPO177 had losses of all chromosomes where MMR genes

are located (chromosomes 2, 3, and 7), but no additional mutations in these genes.

7

SUPPLEMENTARY TABLES

All tables are provided in the Excel workbook

Supplementary Table 1. Clinical characteristics of the cohort.

The table lists the clinical characteristics of all patients in the cohort.

D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor, yrs:

years, BMT: bone marrow transplant

Column D - Protocol

BCM: best clinical practice

Columns H-N indicate which techniques were used on what samples. SNP6: Affymetrix

SNP6 copy number array, WES: whole exome sequencing, WGS: whole genome

sequencing, CapVal: NimbleGen SeqCap Target enrichment validation, RNAseq: RNA

sequencing, Xenograft: xenografting in NSG mice, ddPCR: digital droplet PCR, na: not

applicable

Subtype abbreviations (columns R, S, X, Y, AC, AD)

BCL2/MYC: BCL2, MYC, or BCL6 rearranged with IGH region; B-other: absence of the

known B-ALL subtypes; CRLF2 high: high CRLF2 expression; DUX4: DUX4 rearranged;

ETP: early T-cell precursor; ETV6-RUNX1: ETV6-RUNX1 translocated t(12;21); ETV6-

RUNX1-like: ETV6-RUNX1 like as determined by expression profiling; HLF: TCF3-HLF

translocated; Hyperdiploid: Hyperdiploid >50 chromosomes; Hypodiploid: Hypodiploid <44

chromosomes; iAMP21: intrachromosomal amplification of chromosome 21; Infant: Infant

ALL <1 years of age at diagnosis; KMT2A: KMT2A (MLL) rearranged; KMT2A-like: KMT2A

(MLL) like as determined by expression profiling; Low hypodiploid: containing 32–39

chromosomes; MEF2D: MEF2D rearranged; Near haploid: containing 24–31

chromosomes; PAX5 P80R: containing a biallelic PAX5 alterations including PAX5 P80R

mutation and also PAX5 P80R gene expression profile; PAX5alt: PAX5 intragenic

amplification and PAX5alt gene expression profile; Ph: BCR-ABL translocated t(9;22); Ph-

like: Ph-like as determined by expression array or RNA-seq, TAL1: TAL1 alteration;

TAL1/TLX3: TAL1 and TLX3 alterations, T-ALL: T-lineage ALL, TLX3: TLX3 alteration,

ZNF384: ZNF384 rearranged; na: not applicable, nd: no data

Column T – R1 site

HEM: hematological, CNS: central nervous system, HEM+T: hematological and testicular,

EXTR: extramedullary, TESTES: testicular, MEDMASS: mediastinal mass, na: not

applicable, nd: no data

8

Supplementary Table 2. Deep sequencing of remission samples from relapsed and

non-relapse ALL cases.

Twelve diagnosis-relapse cases (drawn from this study) and 20 ALL cases that did not

relapse with remission samples obtained between 27-49 days (median 44 days) after

diagnosis were enrolled in this analysis. High depth exome sequencing was performed for all

germline samples (average coding region depth 377-515X, median 423X, with >30X

coverage >97% for all samples).

tumorMAF: mutant allele frequency in tumor sample; tumorVarDepth: mutant allele depth in

tumor sample; tumorDepth: total sequence depth in tumor sample; germMAF: mutant allele

frequency in match germline (remission) sample; germVarDepth: mutant allele depth in

germline sample; germDepth: total sequence depth in germline sample; Class: mutations

class; AA Change: amino acid change of the mutation.

Supplementary Table 3. SNV/indel results per patient.

This table describes all SNV/indel variants detected per patient with values for each sample

analyzed in one row. NA: not applicable, nd no data. Column definition is listed below:

A. Patient: patient ID

B. Patient Type: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:

two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-

R1-S: second tumor of different etiology after ALL relapse

C. Evolution: Major: major clone in tumor 2 evolved from major clone in tumor 1, Minor: major

clone in tumor 2 evolved from one minor subclone in tumor 1, Precursor: tumors share

the founder fusion and a limited number of SNVs, Polyclonal: multiple (sub)clones are

preserved, Second Primary Tumor: tumors are clonally unrelated

D. Variant Type: the type of variant. SNV: single nucleotide variant, indel: insertion or

deletion

E. Variant (hg19): genomic location of the variant in human genome build hg19

F. Overall Call: annotation of the variant to be somatic if it becomes clonal in one of the

tumor samples, or somatic subclonal if the variants stays at a subclonal level in all tumor

samples

G. Call_G: variant call in the complete remission sample. Wild type: 0-2 mutant reads, MT:

variant supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant

reads and both WES/WGS and Capture validation techniques identified the variant

and/or the variant was called with higher coverage in other (tumor) samples of the same

9

patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,

Homozygous MAF ≥75%.

H. %var_G: variant allele frequency in % in the complete remission sample

I. MTinN: number of mutant reads in the complete remission sample

J. TOTALinN: total number of reads in the complete remission sample

K. Call_D: variant call in the diagnosis sample. Wild type: 0-2 mutant reads, MT: variant

supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant reads and

both WES/WGS and Capture validation techniques identified the variant and/or the

variant was called with higher coverage in other (tumor) samples of the same patient.

Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,


L. %var_D: variant allele frequency in % in the diagnosis sample

M. MTinD: number of mutant reads in the diagnosis sample

N. TOTALinD: total number of reads in the diagnosis sample

O. Status_D: describes the evolution of the variant in the diagnosis sample. At this stage a

variant can only be wild type (not present) or acquired. NA: not applicable – variant was

excluded based on the lack of reads.

P. Call_R: variant call in the relapse sample. Wild type: 0-2 mutant reads, MT: variant






Q. %var_R: variant allele frequency in % in the relapse sample

R. MTinR: number of mutant reads in the relapse sample

S. TOTALinR: total number of reads in the relapse sample

T. Status_DR: describes the evolution of the variant in the relapse samples as compared to

the diagnosis sample. NA: not applicable – variant was wild type in relapse or no relapse

sample was analyzed.

U. Call_R2: variant call in the second relapse sample. Wild type: 0-2 mutant reads, MT:

variant supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant

reads and both WES/WGS and Capture validation techniques identified the variant

and/or the variant was called with higher coverage in other (tumor) samples of the same

patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,


V. %var_R2: variant allele frequency in % in the second relapse sample

W. MTinR2: number of mutant reads in the second relapse sample10

X. TOTALinR2: total number of reads in the second relapse sample

Y. Status_R1R2: describes the evolution of the variant in the second relapse samples as

compared to the first relapse sample. NA: not applicable – variant was wild type in

second relapse or no second relapse sample was analyzed.

Z. Call_S: variant call in the second tumor sample. Wild type: 0-2 mutant reads, MT: variant






AA. %var_S: variant allele frequency in % in the second tumor sample

AB. MTinS: number of mutant reads in the second tumor sample

AC. TOTALinS: total number of reads in the second tumor sample

AD. Status_DRS: describes the evolution of the variant in the second tumor samples as

compared to the previous tumor sample. NA: not applicable – variant was wild type in

the second tumor or no second tumor sample was analyzed.

AE. Gene: HUGO gene symbol

AF. RefSeq: RefSeq accession number

AG. CLASS: classification based on amino acid change pattern. ‘exon’, mutation in non-

coding RNA genes; ‘splice_region’ mutation not directly affecting the canonical splice

sites but located within 10bp of the canonical splice sites

AH. AA Change: predicted amino acid change for the variant

AI. mRNA Change: variant change at cDNA level with the A of the translation initiation codon

designated as +1

AJ. Coding: classification of the variant as coding or non-coding

AK. PhyloP: PhyloP evolutionary conservation score

AL. CADD_PHRED: Combined Annotation Dependent Depletion scaled score

AM. Grantham Score: score to predict the distance between two amino acids

AN. COSMIC Cancer Gene Census Aug2018: describes whether the gene is recognized as

a cancer gene by the COSMIC database. yes: tier 1 gene, Candidate: tier 2 gene, no:

not present in the Cancer Gene Census in December 2017

AO. Zhang 2015: gene present in the supplementary table 2 of Zhang et al, N Engl J Med

2015; 373(24): 2336-2346

AP. Pathway: gene present in a gene list constructed for this study

AQ. KEGG Name: Kyoto Encyclopedia of Gene and Genomes pathway name.

AR. KEGG Class: Class to which the corresponding KEGG pathway belongs to.

11

Supplementary Table 4. SNV/indel results per sample.

This table describes all SNV/indel variants detected per sample. NA: not applicable, nd no

data. Column definition is listed below:

A. Patient: patient ID


two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-

R1-S: second tumor of different etiology after ALL relapse.

C. Sample: Type of sample. D: diagnosis, G: complete remission, R1: relapse 1, R2: relapse

2, S: second tumor

D. Variant Type: the type of variant. SNV: single nucleotide variant, indel: insertion or

deletion

E. Variant (hg19): genomic location of the variant in human genome build hg19

F. Overall Call: annotation of the variant to be somatic if it becomes clonal in one of the

tumor samples, or somatic subclonal if the variants stays at a subclonal level in all tumor

samples

G. Zygosity: annotation of the variant to be heterozygous, homozygous or subclonal

H. Call: variant call. MT: variant supported with ≥8 mutant reads, Low MT: variant supported

with 3-8 mutant reads and both WES/WGS and Capture validation techniques identified

the variant and/or the variant was called with higher coverage in other (tumor) samples

of the same patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous:

MAF 30-75%, Homozygous MAF ≥75%.

I. %var: variant allele frequency in %

J. MTinT: number of mutant reads

K. TOTALinT: total number of reads

L. Status0-1: describes the evolution of the variant from healthy to tumor. At this stage, a

variant can only be acquired

M. Status1-2: describes the evolution of the variant in the tumor sample as compared to the

previous tumor sample. NA: not applicable – variant called in the first tumor sample

(diagnosis).

N. Gene: HUGO gene symbol

O. RefSeq: RefSeq accession number

P. CLASS: classification based on amino acid change pattern. ‘exon’, mutation in non-coding

RNA genes; ‘splice_region’ mutation not directly affecting the canonical splice sites but

located within 10bp of the canonical splice sites

Q. AA Change: predicted amino acid change for the variant

12

R. mRNA Change: variant change at cDNA level with the A of the translation initiation codon

designated as +1

S. coding: classification of the variant as coding or non-coding

T. PhyloP: PhyloP evolutionary conservation score

U. CADD_PHRED: Combined Annotation Dependent Depletion scaled score

V. Grantham Score: score to predict the distance between two amino acids

W. COSMIC Cancer Gene Census Aug2018: describes whether the gene is recognized as a

cancer gene by the COSMIC database. yes: tier 1 gene, Candidate: tier 2 gene, no: not

present in the Cancer Gene Census in December 2017

X. Zhang 2015: gene present in the supplementary table 2 of Zhang et al, N Engl J Med

2015; 373(24): 2336-2346

Y. Pathway: gene present in a gene list constructed for this study

Z. KEGG Name: Kyoto Encyclopedia of Gene and Genomes pathway name.

AA. KEGG Class: Class to which the corresponding KEGG pathway belongs to.

Supplementary Table 5. Summary of mutation clonality and type by case.

This table summarizes the number, clonality and type of somatic single nucleotide variants

(SNV) and small insertions and deletions (Indel) in each patient (gray shading) and each

sample. D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor.

Clonal: variant with an allele frequency (MAF)>30%. Non-silent: variants of the type

missense, nonsense, frameshift and canonical splice site. Subclonal only variants: variants

that were present at a MAF<30% in all tumor sample(s) of a patient, nd: no data. FS:

frameshift, MS: missense, NS: nonsense, proteinDel: in-frame deletion, proteinIns: in-frame

insertion, CSS: canonical splice site, Genes targeted: number of genes affected. Because of

overlap in genes targeted (recurrent hits in genes in multiple samples), values do not add up,

nd: no data.

Supplementary Table 6. SNP6 DNA copy number alteration results.

The table shows the somatic copy number segments called in all samples studied. Column

definition is listed below:

A. Patient: Patient ID

B. Sample: Sample. D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S:

second tumor

C. Status: describes the evolution of the CNA from one sample to the other. No R: no data

for R sample available13

D. Type: describes the type of CNA based on size. Inter indicates a CNA not containing any

genes, Focal indicates a CNA encompassing 1-3 genes, Large describes a CNA with >3

genes not being an entire chromosome arm, Arm denotes a CNA involving a chromosome

arm and Chr specifies a CNA of an entire chromosome.

E. Comment: n indicates the number of chromosomes as determined by the allele frequency

plots. LOH: loss of heterozygosity

F. CN: copy number calling. Amplification: gain of >2 copies, del: deletion, LOH: loss of

heterozygosity

G. Segment_hg19: affected segment in genome build hg19

H. Segment_hg18: affected segment in genome build hg18

I. Hyperlink_hg18: hyperlink to the segment location in the UCSC browser

J. chr_hg18: chromosome in genome build hg18

K. Cytoband_hg18: sublocation on the chromosome in genome build hg18

L. loc.start_hg18: chromosomal start position of the lesion in genome build hg18

M. loc.end_hg18: chromosomal end position of the lesion in genome build hg18

N. num.mark: number of probes in the segment

O. seg.mean: log2 ([normalized tumor signal]/[normalized normal signal])

P. seg.observedCN: absolute copy number

Q. seg.size_(bp): size of the segment in base pairs

R. Genes.segment: number of genes included in the segment

S. first_10_ genes: lists the names of the first 10 genes in the segment

T. miRNA.segment: number of miRNAs included in the segment

U. first_10_miRNAs: lists the names of the first 10 miRNAs in the segment

Supplementary Table 7. Summary of SNP6 DNA copy number alteration results.

This table summarizes the number and size of somatic CNAs in each patient (gray shading)

and each sample by type of aberration and divided into five categories by size of the

aberration.

CNA: copy number aberration, D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete

remission, S: second tumor, LOH: loss of heterozygosity, N: number, Mb: mega bases, seg.:

segment. Inter indicates a CNA not containing any genes, Focal indicates a CNA

encompassing 1-3 genes, Large describes a CNA with >3 genes not being an entire

chromosome arm, Chromosome arm denotes a CNA involving a chromosome arm and Full

chromosome specifies a CNA of an entire chromosome.

14

Supplementary Table 8. Patterns of clonal evolution.

This table summarizes the number of acquired, lost, and preserved copy number and

sequence variants for each patient.

CNA: copy number alteration, Indel: insertion or deletion, SNV: single nucleotide variant, nd:

no data, NA: not applicable, R1: relapse 1, R2: relapse 2, S: second tumor.

Supplementary Table 9. Genes correlated with time to first relapse

This table shows the correlation of remission time to genes clonally targeted by SNV/indels

and focal CNAs at first relapse. The p value is calculated using a Student’s t-test. The

numbers do not add up to 92 patients because patients with a second primary tumor and

patients with a missing first relapse sample were excluded for this analysis. The mean,

median, minimal and range of time to relapse is indicated in years.

Supplementary Table 10. Clonal mutations and focal CNAs acquired in first relapse or

rise from minor clone of diagnosis.

The table shows the SNV/indels and focal acquired or rise in the major clone of first relapse

(R1) samples compared to diagnosis stage(D). The SNV/indels were selected following two

criteria: (i) the mutations acquired in the major clone (MAF≥0.3) of R1 compared to D; or (ii)

rise from a minor clone (MAF<0.3) in D to the major clone (MAF≥0.3) in R1 stage, with the

increment of MAF from D to R1 greater than 0.10. For focal CNA, only the ones acquired in

R1 is maintained.

MTinT: number of mutant reads; TOTALinT: total number of reads; MAF_D: mutant allele

frequency in diagnosis sample.

Supplementary Table 11. Significantly mutated genes in the major clone of both

diagnosis and first relapse samples.

The table shows the significantly mutated genes in the major clone (MAF ≥0.3) of diagnosis

(D) and first relapse (R1) samples identified by GRIN and/or MutSigCV. start_hg19: start

position of the gene in human genome reference version hg19; end_hg19: end position of

the gene; Nsubj: number of subjects (patients); CN.gain: copy number gain; CN.loss: copy

number loss; LOH: loss of heterozygosity; pval: p value calculated by GRIN; qval: q value

calculated by GRIN; p/qval.[1-3]lsn: p/q value calculated by GRIN based on 1-3 lesion type

models; p/qmin: the minimum value of p/qval.[1-3]lsn; MutSigCV-p/qval: p/q value calculated 15

by MutSigCV based on SNV and indels; CancerGene: Is the gene reported as a cancer gene

(according to COSMIC Cancer Gene Census V86) or leukemia relevant in recent studies.

Genes in red font are identified as driver genes by both MutSigCV and GRIN (q-value<0.1),

Genes in green font are known cancer/leukemia associated genes identified as driver gene

by GRIN (q-value<0.1). Genes in blue are newly identified driver genes by GRIN (q-

value<0.1) and targeted in >5 cases.

Supplementary Table 12. Significantly mutated genes acquired in the major clone of

first relapse versus the matched diagnosis sample.

The table shows the significantly mutated genes acquired in the major clone (MAF ≥0.3) of

the first relapse (R1) versus the matched diagnosis sample. The Abbreviations are the same

as Supplementary Table 11. Genes in red font are identified as driver genes by both

MutSigCV and GRIN (q-value<0.1), Genes in green font are known cancer/leukemia

associated genes identified as driver gene by GRIN (q-value<0.1). Genes in blue are newly

identified driver genes by GRIN (q-value<0.1) and targeted in >5 cases.

Supplementary Table 13. Signaling mutations.

This table summarizes the mutations in genes involved in signaling. Column definition is

listed below:


B. Sample: Type of sample. D: diagnosis, R1: relapse 1

C. MTinT: number of mutant reads. na: not applicable

D. TOTALinT: total number of reads. na: not applicable

E. %var: variant allele frequency in %

F. Gene: HUGO gene symbol

G. Pathway: Designated pathway the gene is involved in

H. Variant (hg19): genomic location of the variant in human genome build hg19. Na: not

applicable when either variant was based on expression profiling, or on SNP6

I. Class: classification based on amino acid change pattern. FS: frameshift, MS: missense,

NS: nonsense, proteinDel: in-frame deletion, proteinIns: in-frame insertion, Fusion gene:

fusion as detected by RNASeq, hemiLoss: 1 copy loss, homoLoss: 2 copy loss, na: not

applicable.

J. aaChange: predicted amino acid change for the variant. Na: not applicable.

16

K. CNA: copy number status of the gene. A copy number of 2 indicates normal copy number,

numbers <2 indicate loss, numbers >2 indicate gain of copy number. LOH: copy neutral

loss of heterozygosity.

Supplementary Table 14. Preservation and shifts in mutated signaling pathways by

case.

This table lists the affected signaling pathways and their evolution from diagnosis to relapse

in each patient. Column definition is listed below:


B. Diagnosis: variants present in diagnosis. Variants in red were found at MAF<30%

C. 1st relapse (R1): variants present in Relapse. Variants in red were found at MAF<30%

D: Note for R1: overview of the evolution of the pathways and mutations

Supplementary Table 15. Mutation enrichment at relapse.

This table summarizes the number of non-silent alterations (CNA+SNV/indel) in key genes

identified at diagnosis, relapse or both disease stages. The data is connected to Figure 1.

Column definition is listed below:

A. Gene: HUGO gene symbol

B. D: number of variants present at diagnosis

C. R: number of variants present in relapse

D. DR: number of variants present in both diagnosis and relapse

E. total: total number of variants present in the gene

F. R rate: (R+DR) / total

Supplementary Table 16. Recurrence per gene for clonal SNV/indels and CNAs parsed

by mutation type.

This table summarizes the number and type of somatic clonal mutations (MAF>30%) in a

gene.

Gene: HUGO gene symbol, Cancer Gene Census: ‘Yes’: tier 1 gene, ‘Candidate’: tier 2

gene, ‘No’: not present in the COSMIC Cancer Gene Census in August 2018. Leukemia

gene: PubMed IDs of papers describing the gene in association with leukemia.

Mutation overview: Truncating: presence of frameshift, nonsense or canonical splice site

mutations; Coding MS: presence of non-synonymous missense mutations or in-frame

17

deletions and insertions; Non-coding: presence of silent mutations or mutations in UTR, in

non-coding RNA genes or a mutation not directly affecting the canonical splice sites but

located within 10bp of the canonical splice sites predicted non transcribed exon or in the

splice region; CNA: presence of a focal CNA (affecting 1-3 genes). Y: yes, N: No.

SNV/indel: overview of the number of somatic clonal (MAF>30%) SNV/indels parsed by the

various mutation types in the genes indicated. The numbers are determined per patient.

CNA: overview of the number of clonal CNAs parsed by the type of CNA in the genes

indicated. The numbers are determined per patient.

Total non-silent: The total number of non-silent aberrations (CNA and SNV/indel) in a gene.

Total total: Total number of hits in a gene.

Tolerance scores ExAC: overview of the tolerance scores for each gene as determined using

the ExAC database. syn_z - corrected synonymous Z score, mis_z - corrected missense Z

score, lof_z - corrected loss-of-function Z score, pLI - the probability of being loss-of-

function intolerant (intolerant of both heterozygous and homozygous lof variants), pRec -

the probability of being intolerant of homozygous, but not heterozygous lof variants, pNull -

the probability of being tolerant of both heterozygous and homozygous lof variants,

del.sing.score: Winsorised single-gene deletion intolerance z-score, dup.sing.score:

Winsorised single-gene duplication intolerance z-score, del.score: Winsorised deletion

intolerance z-score, dup.score: Winsorised duplication intolerance z-score, cnv.score:

Winsorised cnv intolerance z-score, flag: Gene is in a known region of recurrent CNVs

mediated by tandem segmental duplications and intolerance scores are more likely to be

biased or noisy. nd: no data.

Genes highlighted in purple font are the most frequently targeted by SNV/indels and CNA

combined. Genes highlighted in orange font are the most frequently targeted by (focal) CNAs

only. Genes highlighted in red font are the most frequently targeted by truncating SNV/indels

only. Genes highlighted in blue font are the most frequently targeted by coding missense

SNV/indels only. Genes highlighted in green font are the most frequently targeted by non-

coding SNV/indels only. Genes highlighted in gray shading are the most frequently targeted

by clonal SNV/indels only and not found mutated at subclonal level in our cohort.

Supplementary Table 17. SNV/indel and CNA recurrence per gene.

This table summarizes the number of hits in a gene per disease state. Genes in red font

encompass the genes uniquely and clonally affected in relapse 1. Genes in orange font

represent the genes uniquely clonally affected in relapse 2.

Gene: HUGO gene symbol, Clonal SNV/indels: SNV/indel with a MAF>30%, Cancer Gene

Census: ‘Yes’: tier 1 gene, ‘Candidate’: tier 2 gene, ‘No’: not present in the COSMIC Cancer 18

Gene Census in August 2018, Leukemia gene: PubMed IDs of papers describing the gene in

association with leukemia, # events: number of patients with a mutation, D: diagnosis, G:

complete remission, R1: relapse 1, R2: relapse 2, S: second tumor.

Supplementary Table 18. Main aspects of each patient in the cohort.

This table lists the main mutations and clinical aspects of each patient in the cohort. Column

definition is listed below:



two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-R1-

S: second tumor of different etiology after ALL relapse.

C. Lineage: hematological lineage of the diagnosis

D. R1 RemTime: time in years between diagnosis and relapse 1

E. R2 RemTime: time in years between relapse 1 and relapse 2. na: not applicable

F. Subtype D: Leukemia subtype at diagnosis. BCL2/MYC: BCL2 or MYC rearranged, B-

other: absence of the known B-ALL subtypes, DUX4: DUX4 or ERG rearranged, ETP:

early T-cell precursor, ETV6-RUNX1: ETV6-RUNX1 translocated t(12;21), ETV6-RUNX1-

like: ETV6-RUNX1 like as determined by expression profiling, HLF: TCF3-HLF

translocated, Hyperdiploid: Hyperdiploid >50 chromosomes, Hypodiploid: Hypodiploid <44

chromosomes, iAMP21: intrachromosomal amplification of chromosome 21, KMT2A:

KMT2A (MLL) rearranged, MEF2D: MEF2D rearranged, PAX5 P80R: containing a

homozygous PAX5 P80R mutation, PAX5alt: PAX5 amplificated, Ph: BCR-ABL

translocated t(9;22), Ph-like: Ph-like as determined by expression profiling or RNASeq, T-

ALL: T-lineage ALL, ZNF384: ZNF384 rearranged, nd: no data

G. Subtype 2 D: Leukemia subtype 2 at diagnosis. CRLF2 high: high CRLF2 expression,

Hyperdiploid: Hyperdiploid >50 chromosomes, iAMP21: intrachromosomal amplification of

chromosome 21, Infant: Infant ALL <1 years of age at diagnosis, KMT2A-like: KMT2A

(MLL) like as determined by expression profiling, Low hypodiploid: containing 32–39

chromosomes, Near haploid: containing 24–31 chromosomes, Ph-like: Ph-like as

determined by expression profiling or RNASeq, TAL1: TAL1 subtype, TAL1/TLX3:

TAL1/TLX3 subtype, TLX3: TLX3 subtype, na: not applicable

H. Evolution: Major: major clone in tumor 2 evolved from major clone in tumor 1, Minor: major

clone in tumor 2 evolved from one minor subclone in tumor 1, Precursor: tumors share the

founder fusion and a limited number of SNVs, Polyclonal: multiple (sub)clones are

preserved, Second Primary Tumor: tumors are clonally unrelated

I. Evolution note: mMAF: median MAF19

J. D unique: main genes affected at diagnosis

K. Common D-R1: main genes affected in both diagnosis and relapse

L. R1/S unique: main genes affected in relapse1 or the second tumor sample

M. Common R1-R2: main genes affected in both the first and second relapse

N. R2/S unique: main genes affected in relapse 2 or the second tumor sample

O. Lin Switch: Samples which show a lineage switch are indicated with Y

P. Result IgTCR: Conclusions from Supplementary Table 19 for those samples where we

determined the IgH or TCRβ rearrangements. na: not applicable

Q. HyperMT group: indicates in which subgroup the samples that were hyper mutated (>85

SNV/indels) were classified. Group 1-4 refer to Figure 6. Group Diagnosis: patients with a

hyper mutated diagnosis sample, Preserved: Patients with a hypermutated relapse

sample, with <85 acquired mutations. N: not hypermutated

R. HyperCNA: indicates the patients with copy number instable tumors (>50 CNAs per

sample)

S. Tumor after BMT: indicates whether the tumor occurred after bone marrow

transplantation. no: tumor occurred before bone marrow transplantation, na: not applicable

(no bone marrow transplantation was performed.

Supplementary Table 19. Antigen receptor sequencing results.

This table summarizes the results of the IgH and TCRβ rearrangement sequencing in

selected samples. Column definition is listed below:


B. Number of shared CNA/SNV/indels after Capture Validation: Capture validation detected

SNV/indels with higher sensitivity. SNV/indel: single nucleotide variant, insertion or

deletion, CNA: copy number aberration, D1: diagnosis 1, D2: diagnosis 2, R1: relapse 1,

R2: relapse 2

C. Rearrangement: genes involved in IgH and TCRβ rearrangements. Some rearrangements

encompass the same genes (column C), but they are different on sequence level (column

D).

D. Sequence: identified sequence of the IgH and TCRβ rearrangements

E. MAF in D (%): allele frequency in diagnosis

F. MAF in R1/S (%): allele frequency in relapse 1 or second tumor sample

G. MAF in R2/S (%): allele frequency in relapse 2 or second tumor sample

H. Conclusion: conclusion from this experiment regarding clonal relation of the tumors for

each patient

20

Supplementary Table 20. Genetic alterations detected from WGS.

The fusions and deletions were called manually or by CREST based on the soft-clip reads in

IGV browser to verify if the exact genomic lesions are shared between diagnosis and

relapse. SNVs were called by GATK UnifiedGenotyper and MuTect2 and all the shared ones

were manually reviewed. na: not applicable

Supplementary Table 21. Gene rearrangements detected from RNA-seq.

The table displays the gene rearrangements detected from RNA-seq by FusionCatcher and

CICERO. The columns are named as FusionCatcher reported.

D: diagnosis; R1: relapse 1; R2: relapse 2; G: complete remission; S: second tumor;

NumBinders: number of putative epitopes with binding affinity <= 500nM; UPAS: Unweighted

Putative Antigenicity Score.

Supplementary Table 22. Constitutional copy number aberrations in the patients with a

suspected second primary malignancy.

The table shows the copy number segments called in the germline sample of the cases with

clonally unrelated second primary leukemias. Column definition is listed below:


B. Sample: Sample. G: complete remission

C. CN: copy number calling. gain: gain of 1 copy, del: deletion of 1 copy

D. Segment_hg19: affected segment in genome build hg19

E. Segment_hg18: affected segment in genome build hg18

F. Hyperlink_hg18: hyperlink to the segment location in the UCSC browser

G. chr_hg18: chromosome in genome build hg18

H. Cytoband_hg18: sublocation on the chromosome in genome build hg18

I. loc.start_hg18: chromosomal start position of the lesion in genome build hg18

J. loc.end_hg18: chromosomal end position of the lesion in genome build hg18

K. num.mark: number of probes in the segment

L. seg.mean: log2 ([normalized tumor signal]/[normalized normal signal])

M. seg.observedCN: absolute copy number

N. seg.size_(bp): size of the segment in base pairs

O. Genes.segment: number of genes included in the segment

P. first_10_ genes: lists the names of the first 10 genes in the segment

Q. miRNA.segment: number of miRNAs included in the segment

21

R. first_10_miRNAs: lists the names of the first 10 miRNAs in the segment

S. Putative candidate genes: HUGO gene symbol of the candidate genes in the segment

Supplementary Table 23. Droplet digital PCR results.

This table shows the results from our ddPCR experiments.


B. PatientType: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:

two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-R1-

S: second tumor of different etiology after ALL relapse.

C. Sample: Type of sample. D: diagnosis, G: complete remission, R1: relapse 1, R2: relapse

2, S: second tumor

D. Source: Source of the sample. BM: bone marrow, PB: peripheral blood

E. Days since D: the number of days between diagnosis date and the sample date.

F. Days since R1: the number of days between relapse date and sample date. Negative

numbers indicate a sample taken before relapse development.

G. Blast%: the percentage of blasts in the sample. Nd: no data

H. Assay: ddPCR assay performed.

I. Total Droplets: total number of droplets measured

J. WT Droplets: number of droplets containing wild type PCR product detected

K. MUT Droplets: number of droplets containing mutant PCR product detected

L. MAF ddPCR: mutant allele frequency as determined by ddPCR. The MAFs in gray font are

below the detection limit.

M. MAF WES: mutant allele frequency as detected by whole exome sequencing

N. MAF CAPVAL: mutant allele frequency as detected by capture validation

Supplementary Table 24. Digital droplet PCR dilution curve results.

This table shows the results from the dilution curves of three assays developed for ddPCR.

WT: wild type, MT: mutant, NTC: non-template control, REH: REH cell line.

Supplementary Table 25. Neoepitope analysis of somatic missense SNVs.

This table shows the results of neoepitope analysis result from somatic missense SNVs.

AllBinders: total number of putative epitopes with binding affinity <= 500nM; PercMutExps:

percentage of overlapping reads that expressed the mutant allele; UPAS: Unweighted

Putative Antigenicity Score; WPAS: Weighted Putative Antigenicity Score.22

Supplementary Table 26. Whole exome sequencing metrics.

The table displays the duplication rate, sequencing depth and coverage for the samples that

were subjected to whole exome sequencing in our cohort.

D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor

Supplementary Table 27. Whole genome sequencing metrics.

The table displays the duplication rate, sequencing depth and coverage for the samples that

were subjected to whole genome sequencing in our cohort.


Supplementary Table 28. Capture Validation metrics.

The table displays the number of targets and coverage metrics for the samples that were

subjected to capture validation in our cohort.


Supplementary Table 29. RNA sequencing metrics.

The table displays the type of RNAseq, duplication rate, sequencing depth and coverage for

the samples that were subjected to RNA sequencing in our cohort.


Supplementary Table 30. Droplet digital PCR primers.

This table lists the primers and probes used in our ddPCR assays. The red nucleotide in the

probes indicate the location of the mutant variant.

Supplementary Table 31. Shortlist of pediatric leukemia associated genes and

syndromes.

This table shows all genes we considered as known to be associated with leukemia

susceptibility and pediatric cancer syndromes.

23

AR: autosomal recessive inheritance, AD: autosomal dominant inheritance, AD Missense:

autosomal dominant inheritance of missense variants, X-linked: inherited in an X-linked

fashion, mosaic: only found in mosaic state

Supplementary Table 32. Constitutional variants identified in the cohort.

This table lists the variants found in known cancer predisposing genes (as defined in

Supplementary Table 20) in our cohort. Column definition is listed below:


B. Gene: HUGO gene symbol

C. RefSeq: RefSeq accession number

D. AA Change: predicted amino acid change for the variant

E. mRNA change: variant change at cDNA level with the A of the translation initiation codon

designated as +1

F. MT Type: type of mutation. FS: frameshift, MS: missense, NS: nonsense, proteinDel: in-

frame deletion, CSS: canonical splice site

G. Protein Domain: protein domain in which the variant is located as determined with

www.PecCan.stjude.org. No: no domain annotated.

H. phyloP: PhyloP score for evolutionary conservation on nucleotide level

I. CADD PHRED: Combined Annotation Dependent Depletion ranked score

J. Polyphen: amino acid substitution impact according to Polymorphism Phenotyping v2 as

available through the Alamut package

K. SIFT: amino acid substitution impact according to the degree of conservation of amino

acid residues as available through the Alamut package

L. Mutation taster: estimation of the impact of the variant on the gene product / protein as

available through the Alamut package

M. Alamut splicing: effect of the variant on splicing as available through the Alamut package

N. 2nd hit in tumor: indicates the presence of an additional aberration in the gene in the

tumors.

O. EXAC AF: allele frequency of the variant in the ExAc database

P. ClinVar: presence of the variant in the ClinVar database. VUS: variant of unknown

significance, no: not present.

Q. HGMD: presence of the variant in the HGMD database. no: not present.

R. Syndrome: syndrome associated with the mutated gene

S. Inheritance: mode of inheritance for the associated syndrome. AR: autosomal recessive

inheritance, AD: autosomal dominant inheritance, X-linked: inherited in an X-linked fashion

T. MAF (%): variant allele frequency in %24

U. MT rds: number of mutant reads

V. TOT rds: total number of reads

W. Quality: GATK quality score

25

SUPPLEMENTARY FIGURES

Supplementary Figure 1. Capture Validation and WES/WGS calling are highly

concordant.

a, For most of the variants (90%), capture validation (CapVal) and WES/WGS agree on the

variant calling. D, diagnosis; R1, first relapse; R2, second relapse; S, second tumor. b,

Sequencing metrics between the two techniques for the agreed variants calls (dark blue in

a). The total reads number and the mutant reads per variant are higher for the CapVal, but

the mutant allele frequency (MAF) for both techniques is comparable for the variants. The

box and whiskers plots depict the median (black line), 25th and 75th percentiles (box

borders) and the highest/lowest value that is within 1.5*IQR (inter-quartile range; whiskers).

Data plotted as individual points beyond the end of the whiskers are outliers. c, For a total of

1,466 variants CapVal and WES/WGS techniques differed on the calling category based on

MAF. Most somatic mutations were called with MAF around 30% (indicated in red dashed

line), which was used as a cut off for subclonal vs heterozygous (clonal) state; a few hover

around the hetero-homozygous cut off of 75% (indicated in blue dashed line). d, Somatic

mutations in diagnosis samples missed in WES/WGS and rescued by CapVal. The analysis

was done in cases with both diagnosis and first relapse samples sequenced by CapVal. With 26

a minimum number (≥3) of mutant (mut.) reads support for a mutation, 3.4% of the somatic

mutations were recovered by CapVal in the diagnosis samples.

27

Supplementary Figure 2. SNV/indel and CNA mutation burden.

a, Histogram of the number of SNV/indels (top) and the relative contribution of the mutation

types (bottom) in each sample sorted from left to right by the total number of SNV/indels. The

coding variants are indicated by shades of red, whereas the non-coding variants are

indicated in shades of blue. b, Histogram of the number of CNAs (top) and the relative

28

contribution of the mutation types (bottom) in each sample sorted from left to right by the total

number of CNAs. c, Missense, silent and 3-prime UTR mutations are most frequent

SNV/indels found in this study. d, Deletions are the most common CNA in this study. Amp:

amplification – gain of >2 copies, hgain: 2 copy gain, gain: 1 copy gain, aUPD: acquired

uniparental disomy – copy number neutral loss of heterozygosity, del: 1 copy deletion, hdel:

2 copy deletion. The box and whiskers plots depict the median (black line), 25th and 75th

percentiles (box borders) and the highest/lowest value that is within 1.5*IQR (inter-quartile

range; whiskers). Data plotted as individual points beyond the end of the whiskers are

outliers.

29

30

Supplementary Figure 3. Mutation burden increases with disease progression.

a, The number of all SNV/indels (left), of clonal (MAF>30%; middle), and clonal non-silent

SNV/indels (right) increase from diagnosis to the subsequent relapses. This effect remains

when the outliers are removed (data not shown). ** p<0.0001, * p<0.001. b, The SNV/indel

mutation burden is not influenced by the type of mutation, tumor lineage, gender, age at

diagnosis, remission time and vital status of the patient. c, The number of CNAs increase in

second relapse (* p<0.05). This effect is not influenced by tumor lineage, gender, age at

diagnosis, remission time and vital status of the patient. The box and whiskers plots depict

the median (black line), 25th and 75th percentiles (box borders) and the highest/lowest value

that is within 1.5*IQR (inter-quartile range; whiskers). Data plotted as individual points

beyond the end of the whiskers are outliers. D, diagnosis; R1, first relapse; R2, second

relapse; S, second tumor.

31

Supplementary Figure 4. Cut-off determination for hypermutated samples.

a, Histogram of the number of SNV/indels in all samples. b, Histogram of the number of

CNAs in all samples. Red arrows indicate the cut-off number of SNV/indels (N=85) and CNA

(N=50) used for defining hypermutators. c, Hypermutator distribution in different disease 32

stages. The ratio of hypermutated samples was relatively high in the second relapses. d, The

number of SNV/indels detected in each sample categorized by subtype of ALL at diagnosis

and colored by disease stage, showing variable number of mutations across subtypes. The

horizontal dotted line indicates the cut-off of >85 SNV/indels. Box and Whisker plots in the

background indicate the median (thick line), 25th and 75th percentiles (box borders) and the

highest/lowest value that is within 1.5*IQR (inter-quartile range; whiskers) number of

SNV/indels. The ALL cases are grouped into well-defined disease subtypes, which include

DUX4-rearranged (DUX4), ETV6-RUNX1, hyperdiploid, hypodiploid, KMT2A (MLL) -

rearranged, BCR-ABL1 (Ph), Ph-like, other B-ALL subtypes, early T-cell precursor ALL (ETP)

and T-lineage ALL non-ETP (T-ALL). e, Histogram of the number of SNV/indels (top) and the

relative contribution of the mutation types (bottom) in each hypermutator (>85 SNV/indels)

sorted from left to right by the total number of SNV/indels. The coding variants are indicated

by shades of red, whereas the non-coding variants are indicated in shades of blue. f, Histogram of the number of CNAs (top) and the relative contribution of the mutation types

(bottom) in each CNA hypermutator (>50 CNAs) sorted from left to right by the total number

of CNAs.

33

34

Supplementary Figure 5. Signaling pathway and Epigenomic mutations in ALL at

diagnosis and relapse.

a, Genes involved in signaling pathways with somatic mutations identified in diagnosis (D)

and/or first available relapse (R) sample per case. b, Epigenetic regulators with somatic

mutations identified in diagnosis (D) and/or first available relapse (R) sample per case are

grouped into 6 epigenetic categories(19). Same mutation annotation schema is applied as

Figure 1a.

35

Supplementary Figure 6. Transcription factor/regulator mutations in ALL at diagnosis

and relapse.

Genes with somatic mutations identified in diagnosis (D) and/or first available relapse (R)

sample per case are grouped into transcription factors and transcriptional regulators. Same

mutation annotation schema is applied as Figure 1a.

36

Supplementary Figure 7. Frequently mutated genes in relapsed ALL.

a, ProteinPaint visualizations of the most frequently clonally mutated genes by non-silent

SNV/indels. b, ProteinPaint visualizations of three genes that were exclusively affected by

clonal non-silent SNV/indels in the tumors of more than 5 patients in our cohort.

37

Supplementary Figure 8. Mutational landscape of xenografts resolves clonal structure.

a, Elevated somatic mutation detection sensitivity in xenografted leukemia samples. To

indicate the sensitivity of mutation calling, the highest MAF in the xenografts generated from

the diagnosis sample (XDHM) was picked for each somatic mutation. Mutations showing

increased MAF from D to XDHM are linked by lines. The mutations showing significant

increment (MAF<0.05 in D and ≥0.05 in XDHM) are highlighted in blue and the mutations

missed in D (<3 mutant reads) but rescued (≥3 mutant reads) in XDHM sample are 38

highlighted in red. D: diagnosis patient sample, R1: first relapse patient sample. b, Clonal

evolution model of SJETV010 inferred from MAF of somatic mutations detected from primary

patient samples and xenografted samples. In this case, only diagnosis and second relapse

(R2) samples were available for transplantation. Non-silent mutations in cancer genes

(COSMIC Cancer Gene Census) in each clone are listed. The colors indicate whether

mutations detected in bulk samples could be denoted as a parental clone (shared by D, R1

and R2), a D unique clone, an R2 unique clone and two relapse shared clones. c,

Distribution of somatic mutations’ MAF in D, R1, R2 and xenografted leukemia samples from

SJETV010. The color scheme for different clones is the same as in b. Without xenografts’

information, clones 3 and 4 and the clones 5-13 would be indistinguishable. D.*.#, samples

originating from D; R.*.#, originating from R2, *.BM.#, samples collected from bone marrow,

*.CNS.#, collected from central nervous system; *.SP.#, collected from spleen.

39

40

Supplementary Figure 9. Validation of digital droplet PCR (ddPCR) assay.

a, Test sensitivity for the NT5C2 p.R39Q, CREBBP p.R1446C and NRAS p.G12R assays

was determined using serial dilutions of patient derived leukemic cells containing the

mutation with REH cells that were wild type for all mutations tested. The R-squared

coefficient was 0.9994, 0.9987 and 0.9993 respectively. b, A total of 9 samples were

measured twice with high correlation (r = 0.9997). The four different assays are indicated by

colors. c, Representative images of the ddPCR output. Depicted are the results of the

CREBBP p.R1446C assay for 2 tumor samples and 2 remission samples from patient

SJBALL013, plus the REH wild type control and the no template control (NTC). Wild type

droplets are labeled with the VIC fluorescent reporter dye and droplets containing mutant

PCR products are labeled with FAM fluorescent dye. The gates used for counting the wild

type (WT) and mutant (MUT) droplets are indicated, as well as the number of droplets within

these gates. T indicates the time point of the sample in number of days after diagnosis. d,

The MAF as measured by WES (left) or capture validation (CapVal, right) correlates well with

the values measured by ddPCR. A total of 11 samples were found negative (MAF=0%) on

the listed mutations in WES and/or CapVal, but mutant droplets were detected with ddPCR.

One sample was positive in WES (SJTALL001 KRAS p.G12C MAF=3.2%), but measured

0% in both CapVal and ddPCR (Supplementary Table 23).

41

42

Supplementary Figure 10. Mutational signatures.

a, Comparison between the de novo extracted signatures and known COSMIC signatures.

Signature A resembles signature SBS1, characterized by spontaneous deamination of

methylated cytosines in CpG context (red bar). Signature B closely resembles signatures

SBS2 and SBS13, both AID/APOBEC associated signatures (orange bar). Signature C

clusters close to signatures SBS6, SBS15, and SBS20, all signatures associated with

mismatch repair (MMR; blue bar). Signature D shows highest resemblance to a group of

signatures with less distinct characteristics. b, Cosine similarity heatmap of the complete set

of acquired mutations per sample relative to involved COSMIC and de novo-extracted

signatures (based on clusters identified in panel a) shows clustering of samples in the four

groups presented in Figure 6. Clusters are indicated by colored bars as in panel a. Samples

marked by a grey bar all have a major contribution of the less distinctive mutational profile D.

c, De novo extracted mutational signatures using WGS data. Signature profiles are highly

concordant with signatures extracted from WES data. d, De novo extracted somatic

mutational signature A using WGS data (top panel) and signature SBS1 (bottom panel)

according to the 96-substitution classification. Signature A is hallmarked by a higher relative

number of ApCpG, CpCpG and GpCpG compared to TpCpG. e, Total mutation profiles

(containing all substitutions) of SJETV10-R2 and SJHYPER022-R1 based on WGS data. f, Absolute contribution of each of the four signatures (WGS data) to each of 26 diagnosis and

relapse samples. Composition of the four signatures in individual samples was highly

concordant with the composition obtained from WES data. g, Relative contribution of C>T

transitions at CpGs in transcribed and untranscribed strand of four samples with a prominent

signature A contribution. h, Absolute contribution of C>T transitions at CpGs in the leading

and lagging strands (top panels) and CpG>TpG mutation density in early, intermediate and

late replicated regions of two hypermutated relapses and three healthy colon organoids.

43

Supplementary Figure 11. Neoepitope analysis of somatic missense SNVs.

a, Number of predicted HLA-binding mutant peptides (<=500nM) per tumor correlates

significantly with missense mutation burden (p < 0.001). Correlation efficiency (r2) of the fitted

functions are shown in hypermutators (HPM) and non-HPM samples, respectively. Different

disease stages, including diagnosis (D), first relapse (R1), and second relapse (R2), are

shown in different colors. The number of predicted binders also varies significantly as an

effect of HPM status (p<0.001), with significant interactions between HPM status and

mutation burden (p<0.001) and between disease progression and mutation burden

(p=0.0452). b, the number of predicted HLA-binding mutant peptides (<=500nM) per tumor

varies as a function of disease stage (D–R1–R2; p<0.001) and HPM status (p<0.001), with

more putative binders evident in hypermutated tumors and as disease progresses.

Differences in the number of binders per tumor owed to effects of disease (B- and T-ALL),

though visually apparent, did not reach the threshold for statistical significance (p=0.0511).

No significant interaction between disease progression and HPM status was identified. c,

Distribution of WPAS (Weighted Putative Antigenicity Score) scores across disease

progression using a subset of SNVs with available expression data. WPAS varies as an

effect of disease progression (p=0.015), with median WPAS highest at the R2 stage

(suggesting more antigenic variants in tumors from prolonged disease progression). d, 44

Distribution of WPAS scores categorized by COSMIC cancer gene status in different disease

stages. WPAS again varies as an effect of disease progression (as in c) but also according

to Cosmic cancer gene status categories (p=0.005), with Cosmic genes showing overall

higher putative antigenicity. All P values were calculated by the ANOVA test.

45

FIGURES

Figure 1. Somatic mutation spectrum in ALL at diagnosis and relapse.Figure 2. Patterns of relapse in ALL.Figure 3. CNAs and MEF2D-BLC9 rearrangements in patient SJBALL006.Figure 4. Integration of mutational landscape and xenografts resolves clonal structure in ALL.Figure 5. Digital droplet PCR reveals mutations at low levels in intermediate complete remission samples.Figure 6. Mutational signature analysis of hypermutated relapses identifies multiple distinct mutational processes in hypermutation.

46

REFERENCES

1. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res 2015;43:D805-11

2. Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic acids research 2017;45:D777-D83

3. Mullighan CG, Miller CB, Radtke I, Phillips LA, Dalton J, Ma J, et al. BCR-ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros. Nature 2008;453:110-4

4. Mullighan CG, Su X, Zhang J, Radtke I, Phillips LA, Miller CB, et al. Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. The New England journal of medicine 2009;360:470-80

5. Zhang J, McCastlain K, Yoshihara H, Xu B, Chang Y, Churchman ML, et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nature genetics 2016;48:1481-9

6. Alexander TB, Gu Z, Iacobucci I, Dickerson K, Choi JK, Xu B, et al. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 2018

7. Waanders E, Hebeda KM, Kamping EJ, Groenen PJ, Simons A, Hoischen A, et al. Independent development of lymphoid and histiocytic malignancies from a shared early precursor. Leukemia 2016;30:955-8

8. Weng AP, Ferrando AA, Lee W, Morris JPt, Silverman LB, Sanchez-Irizarry C, et al. Activating mutations of NOTCH1 in human T cell acute lymphoblastic leukemia. Science 2004;306:269-71

9. Ge Z, Li M, Zhao G, Xiao L, Gu Y, Zhou X, et al. Novel dynamin 2 mutations in adult T-cell acute lymphoblastic leukemia. Oncology letters 2016;12:2746-51

10. Chao MM, Todd MA, Kontny U, Neas K, Sullivan MJ, Hunter AG, et al. T-cell acute lymphoblastic leukemia in association with Borjeson-Forssman-Lehmann syndrome due to a mutation in PHF6. Pediatric blood & cancer 2010;55:722-4

11. Van Vlierberghe P, Palomero T, Khiabanian H, Van der Meulen J, Castillo M, Van Roy N, et al. PHF6 mutations in T-cell acute lymphoblastic leukemia. Nature genetics 2010;42:338-42

12. Narumi S, Amano N, Ishii T, Katsumata N, Muroya K, Adachi M, et al. SAMD9 mutations cause a novel multisystem disorder, MIRAGE syndrome, and are associated with loss of chromosome 7. Nat Genet 2016;48:792-7

13. Schwartz JR, Wang S, Ma J, Lamprecht T, Walsh M, Song G, et al. Germline SAMD9 mutation in siblings with monosomy 7 and myelodysplastic syndrome. Leukemia 2017;31:1827-30

14. Chen DH, Below JE, Shimamura A, Keel SB, Matsushita M, Wolff J, et al. Ataxia-Pancytopenia Syndrome Is Caused by Missense Mutations in SAMD9L. Am J Hum Genet 2016;98:1146-58

15. Waanders E, Scheijen B, Jongmans MC, Venselaar H, van Reijmersdal SV, van Dijk AH, et al. Germline activating TYK2 mutations in pediatric patients with two primary acute lymphoblastic leukemia occurrences. Leukemia 2017;31:821-8

16. Shah S, Schrader KA, Waanders E, Timms AE, Vijai J, Miething C, et al. A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia. Nature genetics 2013;45:1226-31

17. Auer F, Ruschendorf F, Gombert M, Husemann P, Ginzel S, Izraeli S, et al. Inherited susceptibility to pre B-ALL caused by germline transmission of PAX5 c.547G>A. Leukemia 2014;28:1136-8

18. Russell LJ, Capasso M, Vater I, Akasaka T, Bernard OA, Calasanz MJ, et al. Deregulated expression of cytokine receptor gene, CRLF2, is involved in lymphoid

47

Esme Waanders, 11/01/19,

Is this needed here?

transformation in B-cell precursor acute lymphoblastic leukemia. Blood 2009;114:2688-98

19. Huether R, Dong L, Chen X, Wu G, Parker M, Wei L, et al. The landscape of somatic mutations in epigenetic regulators across 1,000 paediatric cancer genomes. Nature communications 2014;5:3630

48

Documents

bloodcancerdiscov.aacrjournals.org€¦ · Web viewThree variants were found in genes associated with Noonan syndrome (OMIM #616564, #610733, and #616559) or Noonan-like syndrome