Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Supplementary Note
Rise in mutation burden with disease progression
All somatic sequence variants detected in the exomes of diagnosis and relapse samples
(N=16,615) were subjected to confirmatory capture-based sequencing to provide a more
accurate estimate of mutant allele frequency (MAF) and facilitate detection of subclonal
variants. Overall, there was excellent concordance in mutation calling between discovery and
verification sequencing (89.7% of mutations), with discordance in defining subclonal (MAF
<30%), heterozygous (MAF, 30%~75%) and homozygous (MAF >75%) status in 2% of the
mutations (Supplementary Figure 1a-c). Importantly, 3.4% of the mutations initially
considered relapse-specific from analysis of WES/WGS data were shown to be present at
diagnosis, including NOTCH1, NRAS, KRAS, TP53, and CREBBP (Supplementary Figure1d). In line, our deep sequencing efforts revealed tumor specific mutations in the complete
remission samples (Supplementary Table 3). A total of 936 tumor specific mutations (6.7%
of the mutations) were detected in the complete remission samples of 72 patients (median of
4 mutations per sample, range 1-195 mutations). The median MAF of tumor specific
mutations in the complete remission samples was 1.43% (range 0.1% - 29.8%), which
increased to 43.2% (range 1.2% - 99.5%) in the corresponding tumor samples. The median
increase in MAF from complete remission to tumor sample was 95.4% (range 19.5% -
99.9%). This indicates various degrees of discernable tumor contamination in the complete
remission samples. To test whether the residual of somatic mutations in complete remission
is an indicator of relapse, we identified 12 diagnosis-relapse cases (drawn from this study)
and 20 ALL cases that did not relapse with remission samples obtained between 27-49 days
(median 44 days) after diagnosis. High depth exome sequencing was performed for all
germline samples (depth 377-515X, median 423X) and the somatic mutations identified in
tumor samples were examined in the remission samples (Supplementary Table 2). In
contrast to prior data in AML, there was no association between the presence of residual
somatic mutations in the germline samples and the risk of relapse (5 out of 12 relapses ALL
cases have residual somatic mutations in remission samples compared to 12 out of 20 non-
relapse ALL cases; Fisher’s exact test, P=0.467).
The most frequent mutation types identified in the cohort were missense mutations
(coding, silent or located in the UTRs) and single copy number deletions (SupplementaryFigure 2). The burden of mutations increased with disease progress from a median of 23
(range 1-164) single nucleotide variants (SNVs) and short insertions/deletions (indels) at
diagnosis to 41 (range 4-1,139) at relapse and 117 (range 26-1,699) at second relapse
(Supplementary Figure 3). This increase was observed for coding and non-coding variants,
and for clonal and subclonal variants, suggesting ongoing mutational processes. No
1
differences in mutation burden were observed according to disease lineage, sex, outcome
(cure or death due to disease), or by age at diagnosis (with the exception of cases of
infant/KMT2A-rearranged leukemia that have a low mutation burden; Supplementary Figure3 and Supplementary Tables Supplementary Table 3-Supplementary Table 5).
A modest increase in the number of structural genetic alterations was also observed,
with a median of 9, 10 and 23 DNA copy number alterations (CNAs) at diagnosis, first and
second relapse respectively (Supplementary Figure 3, and Supplementary Tables Supplementary Table 6-Supplementary Table 7). In exceptional cases, mutations
compromising DNA repair were associated with genomic instability. For example, case
SJBALL022431 had somatic bi-allelic alterations of ATM from deletion of 11q and a
nonsense mutation, and over 100 small losses and gains involving chromosomes 11, 15, and
21 reminiscent of a chromothripsis event (D: N=109 affecting 368.5 Mb and R: N=113
affecting 467.8 Mb; Supplementary Table 7). This is in contrast to other cases with relatively
high CNA burden but without DNA repair defects (e.g., SJETV043 and SJTALL023 with over
50 CNAs per sample), which contained predominantly small deletions but no amplifications
(Supplementary Figure 2b).
Early relapsesNext to CREBBP mutations, we found that mutations in ERG (N=3) and ARID2 (N=4)
associate with relatively late relapses (mean 7.0 vs 2.9 yrs, P=0.011; and mean 6.6 vs 2.9
yrs, P=0.013, respectively). Mutations in ERG and ARID2 occurred exclusively in leukemias
of the DUX4 subtype. Several genes were correlated to early relapse, but the number of
mutations was low (Supplementary Table 9). We identified 8 cases who relapsed
disproportionately early (>1.5 inter-quartile range) within their ALL subtype. Aberrations that
were present exclusively in this group were i(9)(q10) with homozygous PAX5 P80R mutation
(SJBALL192) and NR3C1 frameshift mutations (SJETV010 and SJBALL199). Other relapse
associated aberrations in this group were (combinations of) hypermutation (SJBALL022428
and SJETV010), SNV/indels in NT5C2 (SJBALL192 and SJHYPER098), SNV/indels in TP53
(SJBALL022428 and SJETV010) and SNV/indels and CNAs in IKZF1 (SJBALL0212924
(p.G158S and Del7-8), SJHYPER098 (del1-4), SJPHALL020 (del1-8), and SJTALL093
(p.E76*)), CNAs in TBL1XR1 (SJETV010), CNAs in ETV6 (SJBALL199 and SJETV010) and
homozygous loss of CDKN2A/B (SJBALL192, SJETV010, SJHYPER098, and SJTALL093).
These data indicate that time to relapse is influenced by the combined action of unique
aberrations within ALL subtypes.
2
Frequently mutated genes and pathways
A total of 4,509 genes harbored non-silent SNV/indels mutations, of which 1,063 were
mutated in at least two cases (Supplementary Table 3). Two-hundred and twenty-one were
known targets of mutation in cancer according to the COSMIC Cancer Gene Census databse
(1) or leukemia studies (Supplementary Figure 7a and Supplementary Table 16). Among
the 293 genes recurrently (n≥2) and exclusively targeted in the major clones at relapse (i.e.
not targeted at diagnosis) by acquired SNV/indels or CNAs (Supplementary Table 16), 82
were targeted by non-silent variants, of which 6 were CGC genes (reported in the COSMIC
Cancer Gene Census or in recent leukemia studies; NCOR2, NT5C2, FOXA1, MED12,
PABC1, and PPMID)(2). Twelve genes harbored recurrent clonal and acquired non-silent
mutations uniquely at second relapse, including AASDH, AGBL3, ARGHAP29, ATP11C,
CSF3R, DCK, LRP6, SCRIB, SDK2, TENM3, TNIP1, and ZNF436 (Supplementary Table17), suggesting a possible role in treatment resistance or evasion.
In general, B and T-lineage ALL show distinct genetic alteration profiles, e.g. genes
involved in B cell development are enriched in B-ALL (Fisher’s exact test, P=0.002), while
mutations in transcription factor NOTCH1 involved in maturation of both CD4+ and CD8+
cells in the thymus were exclusively observed in T-ALL (Supplementary Figure 6). Fifteen
genes were associated with disease lineage, e.g. VPREB1 and CREBBP with B-ALL, and
NOTCH1, WT1, PTEN, STIL, and NT5C2 with T-ALL (Fisher’ exact p<0.05). The most
common alteration perturbing B cell development involved the lymphoid transcription factor
gene IKZF1, which is commonly mutated in BCR-ABL1(3), Ph-like(4) and DUX4-rearranged
ALL (5). Consistent with the known association of IKZF1 alterations with high-risk B-ALL (3),
(4) IKZF1 alterations were strongly enriched at relapse, with preservation (N=21) or
acquisition (N=5) in 26 cases (Supplementary Table 15). Subtype-associated gene
mutations were also observed such as CREBBP in hyperdiploid (9 of 14) and low hypodiploid
(2 of 3) ALL; ARID2 mutations in DUX4-rearranged (4 of 7) and ARID5B in hyperdiploid (4 of
14) ALL.
Overall 76% B-ALL and 80% T-ALL harbored at least one signaling pathway mutation
(median 3, range 1-8). The most frequently were the Ras (B-ALL 47.8%, T-ALL 36%), JAK-
STAT (B-ALL 19.4%, T-ALL 28%) and PI3K-AKT (B-ALL 13.4%, T-ALL 28%; Figure 1 and Supplementary Figure 5a). Ras signaling pathway mutations were observed in 41 cases,
and 26 (63.4%) of them were preserved from diagnosis to relapse, with 24.4% (N=10)
acquired and 12.2% (N=5) lost in disease progression. The data also provide new insight by
showing common multiclonality of Ras pathway mutations at diagnosis and frequent
convergence to a dominant mutation at relapse. Of 61 cases with signaling pathway
mutations, 31 harbored at least one Ras pathway mutation at diagnosis, with 11 cases
3
having multiple, commonly subclonal Ras mutations (Supplementary Table 14). Seven
cases showed convergence to a single or two clonal Ras pathway mutations. For the JAK-
STAT pathway, similar enrichment in preserved mutations was observed (N=10, 50.0%
preserved; N=5, 25.0% acquired; and N=5, 25.0% lost). However, most of the PI3K-AKT
pathway mutations in B-ALL were acquired in relapse (8 out of 10), whereas in T-ALL, most
of the mutations identified in diagnosis were lost at relapse (11 out of 13; Supplementary Tables Supplementary Table 13-Supplementary Table 14).
Genes encoding regulators of cell cycle and tumor suppression were commonly
mutated in both B-ALL (N=42, 62.7%) and T-ALL (N=21, 84%), with gene-specific patterns of
enrichment. Ten TP53 alterations (8 cases) were preserved from diagnosis or acquired as
new alterations at relapse. In contrast, CDKN2A mutations were lost (N=6), preserved
(N=34) or acquired (N=15) in disease progression. Epigenetic modifier/regulator mutations
were observed in 67.2% of B-ALL (N=45) and 72% of T-ALL (N=18) (Supplementary Figure5b), most of which were acquired (N=85, 56.7%) or preserved (N=48, 32.0%) from diagnosis
to relapse. CREBBP mutations (22 mutations in 17 B-ALL and 1 T-ALL case) showed
preservation or acquisition in all but one case.
Second primary leukemia
To verify clonal relatedness in a subset of patients, we examined IGH (11 B-ALL patients) or
TCRB (3 T-ALL patients) gene rearrangements by targeted next generation sequencing,
which revealed that the relapses were clonally related to major (N=5) or minor (down to
0.005%; N=7) clones at diagnosis (Supplementary Table 19). Three cases (SJBALL006,
SJTALL049 and SJTALL142) revealed fully discordant tumors for all genetic alterations (SV,
CNA, SNV, Indel and IgH/TCRβ rearrangements), raising the possibility of a distinct second
primary leukemia. SJBALL006 is described in the main text. This patient exhibited intellectual
impairment, short stature, and thrombocytopenia. SJTALL049 is a boy and described in the
main text. He suffered a grand mal seizure at age 4 and has a mild nystagmus. Analysis of
the germline WES data revealed two indels on one allele in TSC2, leading to an in-frame
deletion of 3 amino acids. Mutations in TSC2 are associated with tuberous sclerosis (OMIM
#613254) and can cause brain and kidney tumors, but leukemia has not been reported.
SJTALL142 developed T-ALL at 15 years of age and KMT2A-MLLT10 rearranged MDS at
18. Additional WGS analysis revealed no shared SV or SNVs (Supplementary Table 20).
The boy had neck surgery at 6 months of age due to cysts, and he has a history of renal
insufficiency. The family history was inconclusive for cancer occurrence and the boy did not
harbor any constitutional variants in a gene on our gene list.
Our cohort contained five patients who developed a myeloid tumor after T-ALL. Two
of these tumors were identified as clonally unrelated second primary leukemias. However, 4
three AMLs were clonally related relapses originating from a minor clone (SJTALL124: T-
ALL; 20 mutations shared, e.g. TP53 R273H and CDKN2A deletion), a major clone
(SJTALL164: ETP-ALL; 20 mutations shared, e.g. NF1 deletion), or polyclonal (SJTALL008:
ETP-ALL; 20 mutations shared, e.g. NRAS Q61H), indicating that lymphoid and myeloid
tumors can share a common origin. This is in line with lineage plasticity and multipotent
common ancestor cells identified in mixed phenotype acute leukemia (6) and histiocytic
tumors (7).
Constitutional variants and ALL susceptibility
We analyzed our complete cohort for constitutional variants in genes known to be associated
with leukemia prone syndromes or with leukemia development (Supplementary Tables Supplementary Table 31 and Supplementary Table 32). We found two truncating variants
in TP53 (Y163Tfs*7 and R196*), which are causative for Li Fraumeni syndrome in patients
SJHYPO126 and SJTALL164, respectively. The tumors of both patients showed loss of the
wild type allele through somatic deletion of the short arm of chromosome 17. SJHYPO126
developed hypodiploid leukemia and has an extensive family history for cancer (breast
cancer, liposarcoma, lung cancer, colon cancer and leukemia). SJTALL164 developed ETP-
ALL with a clonally related but phenotypically myeloid relapse. In addition to the truncating
TP53 variant, SJTALL164 harbored constitutional truncating variants in NOTCH1 (Q2440*)
and in DNM2 (T78Nfs*18), both genes that are involved in leukemia development (8,9).
Clinical data was not available for this patient.
In patient SJBALL022480 we found a constitutional truncating mutation in PRDM1
(R25Sfs*13), which is a DNA-binding transcriptional repressor involved in B-cell
differentiation. The probability of intolerance for this gene is very high (pLI=0.98), suggesting
a severe effect of functional loss. Finally, SJBALL013 harbored a truncating mutation
(I1085Hfs*60) in PTCH1, which is associated with basal cell nevus syndrome or Gorlin
syndrome (OMIM #109400). The patient presented with low set ears, macrocephaly and
intellectual disability. In addition, the boy developed a basal cell carcinoma at age 5,
concurring the diagnosis of Gorlin syndrome.
We identified 13 heterozygous variants (13 patients) in genes associated with
autosomal recessively inherited syndromes (e.g. Fanconi Anemia, CMMRD). Likely, these
latter variants are not directly causative for the leukemia in these patients. None of the
variants, except for PMS (G108R) in SJHYPER127, obtained a second hit in the tumors of
the patients. We did not identify homozygous or compound heterozygous variants
(Supplementary Table 32). In SJHYPER021925, we found a missense variant in PHF6
(R163H), a gene associated to the X-linked recessive Borjeson-Forssman-Lehman
syndrome (OMIM #301900) and somatically affected in ALL (10,11). The patient was female 5
with no phenotypical features of Borjeson-Forssman-Lehman syndrome reported. A large
cohort study is needed to reveal whether heterozygous PHF6 variants in females lead to a
susceptibility to leukemia development.
We identified 12 variants of unknown significance (in 11 patients) in genes associated
with autosomal dominantly inherited cancer prone disorders. Three variants were found in
genes associated with Noonan syndrome (OMIM #616564, #610733, and #616559) or
Noonan-like syndrome (OMIM #613563). LZTR1 G711R (SJHYPO126) is located in the
BTB/POZ domain; SOS1 N622S (SJHYPER009) is located in the RasGEFN domain, and
SOS2 E422K (SJTALL098) and CBL P684S (SJERG023) were not located in a defined
domain. None of the genes harbored second hits in the tumors of the patients. The
phenotype of SJHYPO126 and SJHYPER009 did not resemble Noonan syndrome.
SJHYPER009 was diagnosed with Duane syndrome (OMIM #126800) and developmental
delay with speech deficits. SJERG023 presented with skin moles on the back, but no
Noonan-like syndrome phenotype. Phenotypic data was not available for SJTALL098.
We included SAMD9 and SAMD9L in our gene list because of recent reports
associating missense mutations in these genes to MIRAGE syndrome (OMIM #617053) (12),
myelodysplasia and leukemia syndrome with monosomy 7 (MLSM7; OMIM #252270) (13)
and ataxia-pancytopenia syndrome (ATXPC; OMIM #159550) (14). We identified variants in
SAMD9 in the germlines of SJBALL022479 (I1549Sfs*10) and SJPHALL029 (C283Y). The
frameshift variant is not likely to be associated with leukemia development in this patient as
SAMD9 is completely tolerant to loss of function mutations (ExAC pLI=0). For both these
patients’ phenotypic data was not available.
Finally, we found variants of unknown significance in the genes PAX5 (associated
with leukemia susceptibility, OMIM #615545), TYK2 (associated with leukemia susceptibility
(15)), NF1 (associated with neurofibromatosis, OMIM #16220), and CDH1 (associated with
familial gastric cancer, OMIM #137215). We identified a 7 amino acid in-frame deletion
P51del7aa in PAX5 in SJHYPER021999. The variant G183S in the octapeptide domain of
PAX5 was associated with susceptibility to ALL development in three families (16,17), but
further studies should reveal whether a deletion of 7 amino acids in the paired box domain of
this gene can drive leukemia development. No second hits in this gene were found in the
tumor of the patient. We identified a P120L variant in TYK2 in patient SJTALL055. Variants in
the DPG motif of TYK2 have been associated with leukemia susceptibility (15), but no data is
available on whether TYK2 is functionally affected by the P120L variant in the FERM domain.
There was no second hit in the tumor, and phenotypical data for this patient was not
available. The NF1 variant K1385E is located in the RasGAP Neurofibromin domain and was
found in SJTALL092. For this patient no phenotype was reported, but the maternal branch of
the family did harbor some cases of cancer (lymphoma in a maternal uncle, breast cancer in 6
a maternal aunt and colorectal cancer in the maternal grandfather). A second hit in this gene
was not present in the tumors of this patient, which suggests that this variant is not causative
for the leukemia in SJTALL092. SJPHALL018 carried a P373L variant in CDH1, a gene
associated with familial gastric cancer (18), though no cancer was reported in the family. A
link with leukemia predisposition has not yet been reported for CDH1.
Hypermutation and MMR mutations
Patients with hypermutated relapses with the MMR associated mutational profile (group 3) all
harbored biallelic mutations in one of the MMR genes. These included a germline PMS2
G108R mutation and a somatic deletion of PMS2 with retention of the mutant allele in
SJHYPER127-R1; somatic homozygous deletion of MSH2 and MSH6 in SJTALL023-R1;
somatic deletion and homozygous R680* mutation of MSH2 plus a hemizygous deletion of
MSH6 in SJTALL023-R2; and somatic homozygous mutation in MSH2 (R340Nfs*17) in
SJTALL057-R1. In group 1, SJETV10 harbored biallelic copy number deletions in MLH1 in
both relapses, while an acquired subclonal missense mutation in MSH6 was identified in the
relapse of SJHYPER022. SJHYPO177 had losses of all chromosomes where MMR genes
are located (chromosomes 2, 3, and 7), but no additional mutations in these genes.
7
SUPPLEMENTARY TABLES
All tables are provided in the Excel workbook
Supplementary Table 1. Clinical characteristics of the cohort.
The table lists the clinical characteristics of all patients in the cohort.
D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor, yrs:
years, BMT: bone marrow transplant
Column D - Protocol
BCM: best clinical practice
Columns H-N indicate which techniques were used on what samples. SNP6: Affymetrix
SNP6 copy number array, WES: whole exome sequencing, WGS: whole genome
sequencing, CapVal: NimbleGen SeqCap Target enrichment validation, RNAseq: RNA
sequencing, Xenograft: xenografting in NSG mice, ddPCR: digital droplet PCR, na: not
applicable
Subtype abbreviations (columns R, S, X, Y, AC, AD)
BCL2/MYC: BCL2, MYC, or BCL6 rearranged with IGH region; B-other: absence of the
known B-ALL subtypes; CRLF2 high: high CRLF2 expression; DUX4: DUX4 rearranged;
ETP: early T-cell precursor; ETV6-RUNX1: ETV6-RUNX1 translocated t(12;21); ETV6-
RUNX1-like: ETV6-RUNX1 like as determined by expression profiling; HLF: TCF3-HLF
translocated; Hyperdiploid: Hyperdiploid >50 chromosomes; Hypodiploid: Hypodiploid <44
chromosomes; iAMP21: intrachromosomal amplification of chromosome 21; Infant: Infant
ALL <1 years of age at diagnosis; KMT2A: KMT2A (MLL) rearranged; KMT2A-like: KMT2A
(MLL) like as determined by expression profiling; Low hypodiploid: containing 32–39
chromosomes; MEF2D: MEF2D rearranged; Near haploid: containing 24–31
chromosomes; PAX5 P80R: containing a biallelic PAX5 alterations including PAX5 P80R
mutation and also PAX5 P80R gene expression profile; PAX5alt: PAX5 intragenic
amplification and PAX5alt gene expression profile; Ph: BCR-ABL translocated t(9;22); Ph-
like: Ph-like as determined by expression array or RNA-seq, TAL1: TAL1 alteration;
TAL1/TLX3: TAL1 and TLX3 alterations, T-ALL: T-lineage ALL, TLX3: TLX3 alteration,
ZNF384: ZNF384 rearranged; na: not applicable, nd: no data
Column T – R1 site
HEM: hematological, CNS: central nervous system, HEM+T: hematological and testicular,
EXTR: extramedullary, TESTES: testicular, MEDMASS: mediastinal mass, na: not
applicable, nd: no data
8
Supplementary Table 2. Deep sequencing of remission samples from relapsed and
non-relapse ALL cases.
Twelve diagnosis-relapse cases (drawn from this study) and 20 ALL cases that did not
relapse with remission samples obtained between 27-49 days (median 44 days) after
diagnosis were enrolled in this analysis. High depth exome sequencing was performed for all
germline samples (average coding region depth 377-515X, median 423X, with >30X
coverage >97% for all samples).
tumorMAF: mutant allele frequency in tumor sample; tumorVarDepth: mutant allele depth in
tumor sample; tumorDepth: total sequence depth in tumor sample; germMAF: mutant allele
frequency in match germline (remission) sample; germVarDepth: mutant allele depth in
germline sample; germDepth: total sequence depth in germline sample; Class: mutations
class; AA Change: amino acid change of the mutation.
Supplementary Table 3. SNV/indel results per patient.
This table describes all SNV/indel variants detected per patient with values for each sample
analyzed in one row. NA: not applicable, nd no data. Column definition is listed below:
A. Patient: patient ID
B. Patient Type: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:
two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-
R1-S: second tumor of different etiology after ALL relapse
C. Evolution: Major: major clone in tumor 2 evolved from major clone in tumor 1, Minor: major
clone in tumor 2 evolved from one minor subclone in tumor 1, Precursor: tumors share
the founder fusion and a limited number of SNVs, Polyclonal: multiple (sub)clones are
preserved, Second Primary Tumor: tumors are clonally unrelated
D. Variant Type: the type of variant. SNV: single nucleotide variant, indel: insertion or
deletion
E. Variant (hg19): genomic location of the variant in human genome build hg19
F. Overall Call: annotation of the variant to be somatic if it becomes clonal in one of the
tumor samples, or somatic subclonal if the variants stays at a subclonal level in all tumor
samples
G. Call_G: variant call in the complete remission sample. Wild type: 0-2 mutant reads, MT:
variant supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant
reads and both WES/WGS and Capture validation techniques identified the variant
and/or the variant was called with higher coverage in other (tumor) samples of the same
9
patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,
Homozygous MAF ≥75%.
H. %var_G: variant allele frequency in % in the complete remission sample
I. MTinN: number of mutant reads in the complete remission sample
J. TOTALinN: total number of reads in the complete remission sample
K. Call_D: variant call in the diagnosis sample. Wild type: 0-2 mutant reads, MT: variant
supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant reads and
both WES/WGS and Capture validation techniques identified the variant and/or the
variant was called with higher coverage in other (tumor) samples of the same patient.
Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,
Homozygous MAF ≥75%.
L. %var_D: variant allele frequency in % in the diagnosis sample
M. MTinD: number of mutant reads in the diagnosis sample
N. TOTALinD: total number of reads in the diagnosis sample
O. Status_D: describes the evolution of the variant in the diagnosis sample. At this stage a
variant can only be wild type (not present) or acquired. NA: not applicable – variant was
excluded based on the lack of reads.
P. Call_R: variant call in the relapse sample. Wild type: 0-2 mutant reads, MT: variant
supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant reads and
both WES/WGS and Capture validation techniques identified the variant and/or the
variant was called with higher coverage in other (tumor) samples of the same patient.
Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,
Homozygous MAF ≥75%.
Q. %var_R: variant allele frequency in % in the relapse sample
R. MTinR: number of mutant reads in the relapse sample
S. TOTALinR: total number of reads in the relapse sample
T. Status_DR: describes the evolution of the variant in the relapse samples as compared to
the diagnosis sample. NA: not applicable – variant was wild type in relapse or no relapse
sample was analyzed.
U. Call_R2: variant call in the second relapse sample. Wild type: 0-2 mutant reads, MT:
variant supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant
reads and both WES/WGS and Capture validation techniques identified the variant
and/or the variant was called with higher coverage in other (tumor) samples of the same
patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,
Homozygous MAF ≥75%.
V. %var_R2: variant allele frequency in % in the second relapse sample
W. MTinR2: number of mutant reads in the second relapse sample10
X. TOTALinR2: total number of reads in the second relapse sample
Y. Status_R1R2: describes the evolution of the variant in the second relapse samples as
compared to the first relapse sample. NA: not applicable – variant was wild type in
second relapse or no second relapse sample was analyzed.
Z. Call_S: variant call in the second tumor sample. Wild type: 0-2 mutant reads, MT: variant
supported with ≥8 mutant reads, Low MT: variant supported with 3-8 mutant reads and
both WES/WGS and Capture validation techniques identified the variant and/or the
variant was called with higher coverage in other (tumor) samples of the same patient.
Subclone: variant allele frequency (MAF) <30%, Heterozygous: MAF 30-75%,
Homozygous MAF ≥75%.
AA. %var_S: variant allele frequency in % in the second tumor sample
AB. MTinS: number of mutant reads in the second tumor sample
AC. TOTALinS: total number of reads in the second tumor sample
AD. Status_DRS: describes the evolution of the variant in the second tumor samples as
compared to the previous tumor sample. NA: not applicable – variant was wild type in
the second tumor or no second tumor sample was analyzed.
AE. Gene: HUGO gene symbol
AF. RefSeq: RefSeq accession number
AG. CLASS: classification based on amino acid change pattern. ‘exon’, mutation in non-
coding RNA genes; ‘splice_region’ mutation not directly affecting the canonical splice
sites but located within 10bp of the canonical splice sites
AH. AA Change: predicted amino acid change for the variant
AI. mRNA Change: variant change at cDNA level with the A of the translation initiation codon
designated as +1
AJ. Coding: classification of the variant as coding or non-coding
AK. PhyloP: PhyloP evolutionary conservation score
AL. CADD_PHRED: Combined Annotation Dependent Depletion scaled score
AM. Grantham Score: score to predict the distance between two amino acids
AN. COSMIC Cancer Gene Census Aug2018: describes whether the gene is recognized as
a cancer gene by the COSMIC database. yes: tier 1 gene, Candidate: tier 2 gene, no:
not present in the Cancer Gene Census in December 2017
AO. Zhang 2015: gene present in the supplementary table 2 of Zhang et al, N Engl J Med
2015; 373(24): 2336-2346
AP. Pathway: gene present in a gene list constructed for this study
AQ. KEGG Name: Kyoto Encyclopedia of Gene and Genomes pathway name.
AR. KEGG Class: Class to which the corresponding KEGG pathway belongs to.
11
Supplementary Table 4. SNV/indel results per sample.
This table describes all SNV/indel variants detected per sample. NA: not applicable, nd no
data. Column definition is listed below:
A. Patient: patient ID
B. Patient Type: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:
two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-
R1-S: second tumor of different etiology after ALL relapse.
C. Sample: Type of sample. D: diagnosis, G: complete remission, R1: relapse 1, R2: relapse
2, S: second tumor
D. Variant Type: the type of variant. SNV: single nucleotide variant, indel: insertion or
deletion
E. Variant (hg19): genomic location of the variant in human genome build hg19
F. Overall Call: annotation of the variant to be somatic if it becomes clonal in one of the
tumor samples, or somatic subclonal if the variants stays at a subclonal level in all tumor
samples
G. Zygosity: annotation of the variant to be heterozygous, homozygous or subclonal
H. Call: variant call. MT: variant supported with ≥8 mutant reads, Low MT: variant supported
with 3-8 mutant reads and both WES/WGS and Capture validation techniques identified
the variant and/or the variant was called with higher coverage in other (tumor) samples
of the same patient. Subclone: variant allele frequency (MAF) <30%, Heterozygous:
MAF 30-75%, Homozygous MAF ≥75%.
I. %var: variant allele frequency in %
J. MTinT: number of mutant reads
K. TOTALinT: total number of reads
L. Status0-1: describes the evolution of the variant from healthy to tumor. At this stage, a
variant can only be acquired
M. Status1-2: describes the evolution of the variant in the tumor sample as compared to the
previous tumor sample. NA: not applicable – variant called in the first tumor sample
(diagnosis).
N. Gene: HUGO gene symbol
O. RefSeq: RefSeq accession number
P. CLASS: classification based on amino acid change pattern. ‘exon’, mutation in non-coding
RNA genes; ‘splice_region’ mutation not directly affecting the canonical splice sites but
located within 10bp of the canonical splice sites
Q. AA Change: predicted amino acid change for the variant
12
R. mRNA Change: variant change at cDNA level with the A of the translation initiation codon
designated as +1
S. coding: classification of the variant as coding or non-coding
T. PhyloP: PhyloP evolutionary conservation score
U. CADD_PHRED: Combined Annotation Dependent Depletion scaled score
V. Grantham Score: score to predict the distance between two amino acids
W. COSMIC Cancer Gene Census Aug2018: describes whether the gene is recognized as a
cancer gene by the COSMIC database. yes: tier 1 gene, Candidate: tier 2 gene, no: not
present in the Cancer Gene Census in December 2017
X. Zhang 2015: gene present in the supplementary table 2 of Zhang et al, N Engl J Med
2015; 373(24): 2336-2346
Y. Pathway: gene present in a gene list constructed for this study
Z. KEGG Name: Kyoto Encyclopedia of Gene and Genomes pathway name.
AA. KEGG Class: Class to which the corresponding KEGG pathway belongs to.
Supplementary Table 5. Summary of mutation clonality and type by case.
This table summarizes the number, clonality and type of somatic single nucleotide variants
(SNV) and small insertions and deletions (Indel) in each patient (gray shading) and each
sample. D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor.
Clonal: variant with an allele frequency (MAF)>30%. Non-silent: variants of the type
missense, nonsense, frameshift and canonical splice site. Subclonal only variants: variants
that were present at a MAF<30% in all tumor sample(s) of a patient, nd: no data. FS:
frameshift, MS: missense, NS: nonsense, proteinDel: in-frame deletion, proteinIns: in-frame
insertion, CSS: canonical splice site, Genes targeted: number of genes affected. Because of
overlap in genes targeted (recurrent hits in genes in multiple samples), values do not add up,
nd: no data.
Supplementary Table 6. SNP6 DNA copy number alteration results.
The table shows the somatic copy number segments called in all samples studied. Column
definition is listed below:
A. Patient: Patient ID
B. Sample: Sample. D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S:
second tumor
C. Status: describes the evolution of the CNA from one sample to the other. No R: no data
for R sample available13
D. Type: describes the type of CNA based on size. Inter indicates a CNA not containing any
genes, Focal indicates a CNA encompassing 1-3 genes, Large describes a CNA with >3
genes not being an entire chromosome arm, Arm denotes a CNA involving a chromosome
arm and Chr specifies a CNA of an entire chromosome.
E. Comment: n indicates the number of chromosomes as determined by the allele frequency
plots. LOH: loss of heterozygosity
F. CN: copy number calling. Amplification: gain of >2 copies, del: deletion, LOH: loss of
heterozygosity
G. Segment_hg19: affected segment in genome build hg19
H. Segment_hg18: affected segment in genome build hg18
I. Hyperlink_hg18: hyperlink to the segment location in the UCSC browser
J. chr_hg18: chromosome in genome build hg18
K. Cytoband_hg18: sublocation on the chromosome in genome build hg18
L. loc.start_hg18: chromosomal start position of the lesion in genome build hg18
M. loc.end_hg18: chromosomal end position of the lesion in genome build hg18
N. num.mark: number of probes in the segment
O. seg.mean: log2 ([normalized tumor signal]/[normalized normal signal])
P. seg.observedCN: absolute copy number
Q. seg.size_(bp): size of the segment in base pairs
R. Genes.segment: number of genes included in the segment
S. first_10_ genes: lists the names of the first 10 genes in the segment
T. miRNA.segment: number of miRNAs included in the segment
U. first_10_miRNAs: lists the names of the first 10 miRNAs in the segment
Supplementary Table 7. Summary of SNP6 DNA copy number alteration results.
This table summarizes the number and size of somatic CNAs in each patient (gray shading)
and each sample by type of aberration and divided into five categories by size of the
aberration.
CNA: copy number aberration, D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete
remission, S: second tumor, LOH: loss of heterozygosity, N: number, Mb: mega bases, seg.:
segment. Inter indicates a CNA not containing any genes, Focal indicates a CNA
encompassing 1-3 genes, Large describes a CNA with >3 genes not being an entire
chromosome arm, Chromosome arm denotes a CNA involving a chromosome arm and Full
chromosome specifies a CNA of an entire chromosome.
14
Supplementary Table 8. Patterns of clonal evolution.
This table summarizes the number of acquired, lost, and preserved copy number and
sequence variants for each patient.
CNA: copy number alteration, Indel: insertion or deletion, SNV: single nucleotide variant, nd:
no data, NA: not applicable, R1: relapse 1, R2: relapse 2, S: second tumor.
Supplementary Table 9. Genes correlated with time to first relapse
This table shows the correlation of remission time to genes clonally targeted by SNV/indels
and focal CNAs at first relapse. The p value is calculated using a Student’s t-test. The
numbers do not add up to 92 patients because patients with a second primary tumor and
patients with a missing first relapse sample were excluded for this analysis. The mean,
median, minimal and range of time to relapse is indicated in years.
Supplementary Table 10. Clonal mutations and focal CNAs acquired in first relapse or
rise from minor clone of diagnosis.
The table shows the SNV/indels and focal acquired or rise in the major clone of first relapse
(R1) samples compared to diagnosis stage(D). The SNV/indels were selected following two
criteria: (i) the mutations acquired in the major clone (MAF≥0.3) of R1 compared to D; or (ii)
rise from a minor clone (MAF<0.3) in D to the major clone (MAF≥0.3) in R1 stage, with the
increment of MAF from D to R1 greater than 0.10. For focal CNA, only the ones acquired in
R1 is maintained.
MTinT: number of mutant reads; TOTALinT: total number of reads; MAF_D: mutant allele
frequency in diagnosis sample.
Supplementary Table 11. Significantly mutated genes in the major clone of both
diagnosis and first relapse samples.
The table shows the significantly mutated genes in the major clone (MAF ≥0.3) of diagnosis
(D) and first relapse (R1) samples identified by GRIN and/or MutSigCV. start_hg19: start
position of the gene in human genome reference version hg19; end_hg19: end position of
the gene; Nsubj: number of subjects (patients); CN.gain: copy number gain; CN.loss: copy
number loss; LOH: loss of heterozygosity; pval: p value calculated by GRIN; qval: q value
calculated by GRIN; p/qval.[1-3]lsn: p/q value calculated by GRIN based on 1-3 lesion type
models; p/qmin: the minimum value of p/qval.[1-3]lsn; MutSigCV-p/qval: p/q value calculated 15
by MutSigCV based on SNV and indels; CancerGene: Is the gene reported as a cancer gene
(according to COSMIC Cancer Gene Census V86) or leukemia relevant in recent studies.
Genes in red font are identified as driver genes by both MutSigCV and GRIN (q-value<0.1),
Genes in green font are known cancer/leukemia associated genes identified as driver gene
by GRIN (q-value<0.1). Genes in blue are newly identified driver genes by GRIN (q-
value<0.1) and targeted in >5 cases.
Supplementary Table 12. Significantly mutated genes acquired in the major clone of
first relapse versus the matched diagnosis sample.
The table shows the significantly mutated genes acquired in the major clone (MAF ≥0.3) of
the first relapse (R1) versus the matched diagnosis sample. The Abbreviations are the same
as Supplementary Table 11. Genes in red font are identified as driver genes by both
MutSigCV and GRIN (q-value<0.1), Genes in green font are known cancer/leukemia
associated genes identified as driver gene by GRIN (q-value<0.1). Genes in blue are newly
identified driver genes by GRIN (q-value<0.1) and targeted in >5 cases.
Supplementary Table 13. Signaling mutations.
This table summarizes the mutations in genes involved in signaling. Column definition is
listed below:
A. Patient: Patient ID
B. Sample: Type of sample. D: diagnosis, R1: relapse 1
C. MTinT: number of mutant reads. na: not applicable
D. TOTALinT: total number of reads. na: not applicable
E. %var: variant allele frequency in %
F. Gene: HUGO gene symbol
G. Pathway: Designated pathway the gene is involved in
H. Variant (hg19): genomic location of the variant in human genome build hg19. Na: not
applicable when either variant was based on expression profiling, or on SNP6
I. Class: classification based on amino acid change pattern. FS: frameshift, MS: missense,
NS: nonsense, proteinDel: in-frame deletion, proteinIns: in-frame insertion, Fusion gene:
fusion as detected by RNASeq, hemiLoss: 1 copy loss, homoLoss: 2 copy loss, na: not
applicable.
J. aaChange: predicted amino acid change for the variant. Na: not applicable.
16
K. CNA: copy number status of the gene. A copy number of 2 indicates normal copy number,
numbers <2 indicate loss, numbers >2 indicate gain of copy number. LOH: copy neutral
loss of heterozygosity.
Supplementary Table 14. Preservation and shifts in mutated signaling pathways by
case.
This table lists the affected signaling pathways and their evolution from diagnosis to relapse
in each patient. Column definition is listed below:
A. Patient: Patient ID
B. Diagnosis: variants present in diagnosis. Variants in red were found at MAF<30%
C. 1st relapse (R1): variants present in Relapse. Variants in red were found at MAF<30%
D: Note for R1: overview of the evolution of the pathways and mutations
Supplementary Table 15. Mutation enrichment at relapse.
This table summarizes the number of non-silent alterations (CNA+SNV/indel) in key genes
identified at diagnosis, relapse or both disease stages. The data is connected to Figure 1.
Column definition is listed below:
A. Gene: HUGO gene symbol
B. D: number of variants present at diagnosis
C. R: number of variants present in relapse
D. DR: number of variants present in both diagnosis and relapse
E. total: total number of variants present in the gene
F. R rate: (R+DR) / total
Supplementary Table 16. Recurrence per gene for clonal SNV/indels and CNAs parsed
by mutation type.
This table summarizes the number and type of somatic clonal mutations (MAF>30%) in a
gene.
Gene: HUGO gene symbol, Cancer Gene Census: ‘Yes’: tier 1 gene, ‘Candidate’: tier 2
gene, ‘No’: not present in the COSMIC Cancer Gene Census in August 2018. Leukemia
gene: PubMed IDs of papers describing the gene in association with leukemia.
Mutation overview: Truncating: presence of frameshift, nonsense or canonical splice site
mutations; Coding MS: presence of non-synonymous missense mutations or in-frame
17
deletions and insertions; Non-coding: presence of silent mutations or mutations in UTR, in
non-coding RNA genes or a mutation not directly affecting the canonical splice sites but
located within 10bp of the canonical splice sites predicted non transcribed exon or in the
splice region; CNA: presence of a focal CNA (affecting 1-3 genes). Y: yes, N: No.
SNV/indel: overview of the number of somatic clonal (MAF>30%) SNV/indels parsed by the
various mutation types in the genes indicated. The numbers are determined per patient.
CNA: overview of the number of clonal CNAs parsed by the type of CNA in the genes
indicated. The numbers are determined per patient.
Total non-silent: The total number of non-silent aberrations (CNA and SNV/indel) in a gene.
Total total: Total number of hits in a gene.
Tolerance scores ExAC: overview of the tolerance scores for each gene as determined using
the ExAC database. syn_z - corrected synonymous Z score, mis_z - corrected missense Z
score, lof_z - corrected loss-of-function Z score, pLI - the probability of being loss-of-
function intolerant (intolerant of both heterozygous and homozygous lof variants), pRec -
the probability of being intolerant of homozygous, but not heterozygous lof variants, pNull -
the probability of being tolerant of both heterozygous and homozygous lof variants,
del.sing.score: Winsorised single-gene deletion intolerance z-score, dup.sing.score:
Winsorised single-gene duplication intolerance z-score, del.score: Winsorised deletion
intolerance z-score, dup.score: Winsorised duplication intolerance z-score, cnv.score:
Winsorised cnv intolerance z-score, flag: Gene is in a known region of recurrent CNVs
mediated by tandem segmental duplications and intolerance scores are more likely to be
biased or noisy. nd: no data.
Genes highlighted in purple font are the most frequently targeted by SNV/indels and CNA
combined. Genes highlighted in orange font are the most frequently targeted by (focal) CNAs
only. Genes highlighted in red font are the most frequently targeted by truncating SNV/indels
only. Genes highlighted in blue font are the most frequently targeted by coding missense
SNV/indels only. Genes highlighted in green font are the most frequently targeted by non-
coding SNV/indels only. Genes highlighted in gray shading are the most frequently targeted
by clonal SNV/indels only and not found mutated at subclonal level in our cohort.
Supplementary Table 17. SNV/indel and CNA recurrence per gene.
This table summarizes the number of hits in a gene per disease state. Genes in red font
encompass the genes uniquely and clonally affected in relapse 1. Genes in orange font
represent the genes uniquely clonally affected in relapse 2.
Gene: HUGO gene symbol, Clonal SNV/indels: SNV/indel with a MAF>30%, Cancer Gene
Census: ‘Yes’: tier 1 gene, ‘Candidate’: tier 2 gene, ‘No’: not present in the COSMIC Cancer 18
Gene Census in August 2018, Leukemia gene: PubMed IDs of papers describing the gene in
association with leukemia, # events: number of patients with a mutation, D: diagnosis, G:
complete remission, R1: relapse 1, R2: relapse 2, S: second tumor.
Supplementary Table 18. Main aspects of each patient in the cohort.
This table lists the main mutations and clinical aspects of each patient in the cohort. Column
definition is listed below:
A. Patient: Patient ID
B. Patient Type: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:
two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-R1-
S: second tumor of different etiology after ALL relapse.
C. Lineage: hematological lineage of the diagnosis
D. R1 RemTime: time in years between diagnosis and relapse 1
E. R2 RemTime: time in years between relapse 1 and relapse 2. na: not applicable
F. Subtype D: Leukemia subtype at diagnosis. BCL2/MYC: BCL2 or MYC rearranged, B-
other: absence of the known B-ALL subtypes, DUX4: DUX4 or ERG rearranged, ETP:
early T-cell precursor, ETV6-RUNX1: ETV6-RUNX1 translocated t(12;21), ETV6-RUNX1-
like: ETV6-RUNX1 like as determined by expression profiling, HLF: TCF3-HLF
translocated, Hyperdiploid: Hyperdiploid >50 chromosomes, Hypodiploid: Hypodiploid <44
chromosomes, iAMP21: intrachromosomal amplification of chromosome 21, KMT2A:
KMT2A (MLL) rearranged, MEF2D: MEF2D rearranged, PAX5 P80R: containing a
homozygous PAX5 P80R mutation, PAX5alt: PAX5 amplificated, Ph: BCR-ABL
translocated t(9;22), Ph-like: Ph-like as determined by expression profiling or RNASeq, T-
ALL: T-lineage ALL, ZNF384: ZNF384 rearranged, nd: no data
G. Subtype 2 D: Leukemia subtype 2 at diagnosis. CRLF2 high: high CRLF2 expression,
Hyperdiploid: Hyperdiploid >50 chromosomes, iAMP21: intrachromosomal amplification of
chromosome 21, Infant: Infant ALL <1 years of age at diagnosis, KMT2A-like: KMT2A
(MLL) like as determined by expression profiling, Low hypodiploid: containing 32–39
chromosomes, Near haploid: containing 24–31 chromosomes, Ph-like: Ph-like as
determined by expression profiling or RNASeq, TAL1: TAL1 subtype, TAL1/TLX3:
TAL1/TLX3 subtype, TLX3: TLX3 subtype, na: not applicable
H. Evolution: Major: major clone in tumor 2 evolved from major clone in tumor 1, Minor: major
clone in tumor 2 evolved from one minor subclone in tumor 1, Precursor: tumors share the
founder fusion and a limited number of SNVs, Polyclonal: multiple (sub)clones are
preserved, Second Primary Tumor: tumors are clonally unrelated
I. Evolution note: mMAF: median MAF19
J. D unique: main genes affected at diagnosis
K. Common D-R1: main genes affected in both diagnosis and relapse
L. R1/S unique: main genes affected in relapse1 or the second tumor sample
M. Common R1-R2: main genes affected in both the first and second relapse
N. R2/S unique: main genes affected in relapse 2 or the second tumor sample
O. Lin Switch: Samples which show a lineage switch are indicated with Y
P. Result IgTCR: Conclusions from Supplementary Table 19 for those samples where we
determined the IgH or TCRβ rearrangements. na: not applicable
Q. HyperMT group: indicates in which subgroup the samples that were hyper mutated (>85
SNV/indels) were classified. Group 1-4 refer to Figure 6. Group Diagnosis: patients with a
hyper mutated diagnosis sample, Preserved: Patients with a hypermutated relapse
sample, with <85 acquired mutations. N: not hypermutated
R. HyperCNA: indicates the patients with copy number instable tumors (>50 CNAs per
sample)
S. Tumor after BMT: indicates whether the tumor occurred after bone marrow
transplantation. no: tumor occurred before bone marrow transplantation, na: not applicable
(no bone marrow transplantation was performed.
Supplementary Table 19. Antigen receptor sequencing results.
This table summarizes the results of the IgH and TCRβ rearrangement sequencing in
selected samples. Column definition is listed below:
A. Patient: Patient ID
B. Number of shared CNA/SNV/indels after Capture Validation: Capture validation detected
SNV/indels with higher sensitivity. SNV/indel: single nucleotide variant, insertion or
deletion, CNA: copy number aberration, D1: diagnosis 1, D2: diagnosis 2, R1: relapse 1,
R2: relapse 2
C. Rearrangement: genes involved in IgH and TCRβ rearrangements. Some rearrangements
encompass the same genes (column C), but they are different on sequence level (column
D).
D. Sequence: identified sequence of the IgH and TCRβ rearrangements
E. MAF in D (%): allele frequency in diagnosis
F. MAF in R1/S (%): allele frequency in relapse 1 or second tumor sample
G. MAF in R2/S (%): allele frequency in relapse 2 or second tumor sample
H. Conclusion: conclusion from this experiment regarding clonal relation of the tumors for
each patient
20
Supplementary Table 20. Genetic alterations detected from WGS.
The fusions and deletions were called manually or by CREST based on the soft-clip reads in
IGV browser to verify if the exact genomic lesions are shared between diagnosis and
relapse. SNVs were called by GATK UnifiedGenotyper and MuTect2 and all the shared ones
were manually reviewed. na: not applicable
Supplementary Table 21. Gene rearrangements detected from RNA-seq.
The table displays the gene rearrangements detected from RNA-seq by FusionCatcher and
CICERO. The columns are named as FusionCatcher reported.
D: diagnosis; R1: relapse 1; R2: relapse 2; G: complete remission; S: second tumor;
NumBinders: number of putative epitopes with binding affinity <= 500nM; UPAS: Unweighted
Putative Antigenicity Score.
Supplementary Table 22. Constitutional copy number aberrations in the patients with a
suspected second primary malignancy.
The table shows the copy number segments called in the germline sample of the cases with
clonally unrelated second primary leukemias. Column definition is listed below:
A. Patient: Patient ID
B. Sample: Sample. G: complete remission
C. CN: copy number calling. gain: gain of 1 copy, del: deletion of 1 copy
D. Segment_hg19: affected segment in genome build hg19
E. Segment_hg18: affected segment in genome build hg18
F. Hyperlink_hg18: hyperlink to the segment location in the UCSC browser
G. chr_hg18: chromosome in genome build hg18
H. Cytoband_hg18: sublocation on the chromosome in genome build hg18
I. loc.start_hg18: chromosomal start position of the lesion in genome build hg18
J. loc.end_hg18: chromosomal end position of the lesion in genome build hg18
K. num.mark: number of probes in the segment
L. seg.mean: log2 ([normalized tumor signal]/[normalized normal signal])
M. seg.observedCN: absolute copy number
N. seg.size_(bp): size of the segment in base pairs
O. Genes.segment: number of genes included in the segment
P. first_10_ genes: lists the names of the first 10 genes in the segment
Q. miRNA.segment: number of miRNAs included in the segment
21
R. first_10_miRNAs: lists the names of the first 10 miRNAs in the segment
S. Putative candidate genes: HUGO gene symbol of the candidate genes in the segment
Supplementary Table 23. Droplet digital PCR results.
This table shows the results from our ddPCR experiments.
A. Patient: Patient ID
B. PatientType: the type of patient in the cohort. D-G-R1: single ALL relapse, D-G-R1-R2:
two ALL relapses, D-G-S: no ALL relapse but second tumor of different etiology, D-G-R1-
S: second tumor of different etiology after ALL relapse.
C. Sample: Type of sample. D: diagnosis, G: complete remission, R1: relapse 1, R2: relapse
2, S: second tumor
D. Source: Source of the sample. BM: bone marrow, PB: peripheral blood
E. Days since D: the number of days between diagnosis date and the sample date.
F. Days since R1: the number of days between relapse date and sample date. Negative
numbers indicate a sample taken before relapse development.
G. Blast%: the percentage of blasts in the sample. Nd: no data
H. Assay: ddPCR assay performed.
I. Total Droplets: total number of droplets measured
J. WT Droplets: number of droplets containing wild type PCR product detected
K. MUT Droplets: number of droplets containing mutant PCR product detected
L. MAF ddPCR: mutant allele frequency as determined by ddPCR. The MAFs in gray font are
below the detection limit.
M. MAF WES: mutant allele frequency as detected by whole exome sequencing
N. MAF CAPVAL: mutant allele frequency as detected by capture validation
Supplementary Table 24. Digital droplet PCR dilution curve results.
This table shows the results from the dilution curves of three assays developed for ddPCR.
WT: wild type, MT: mutant, NTC: non-template control, REH: REH cell line.
Supplementary Table 25. Neoepitope analysis of somatic missense SNVs.
This table shows the results of neoepitope analysis result from somatic missense SNVs.
AllBinders: total number of putative epitopes with binding affinity <= 500nM; PercMutExps:
percentage of overlapping reads that expressed the mutant allele; UPAS: Unweighted
Putative Antigenicity Score; WPAS: Weighted Putative Antigenicity Score.22
Supplementary Table 26. Whole exome sequencing metrics.
The table displays the duplication rate, sequencing depth and coverage for the samples that
were subjected to whole exome sequencing in our cohort.
D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor
Supplementary Table 27. Whole genome sequencing metrics.
The table displays the duplication rate, sequencing depth and coverage for the samples that
were subjected to whole genome sequencing in our cohort.
D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor
Supplementary Table 28. Capture Validation metrics.
The table displays the number of targets and coverage metrics for the samples that were
subjected to capture validation in our cohort.
D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor
Supplementary Table 29. RNA sequencing metrics.
The table displays the type of RNAseq, duplication rate, sequencing depth and coverage for
the samples that were subjected to RNA sequencing in our cohort.
D: diagnosis, R1: relapse 1, R2: relapse 2, G: complete remission, S: second tumor
Supplementary Table 30. Droplet digital PCR primers.
This table lists the primers and probes used in our ddPCR assays. The red nucleotide in the
probes indicate the location of the mutant variant.
Supplementary Table 31. Shortlist of pediatric leukemia associated genes and
syndromes.
This table shows all genes we considered as known to be associated with leukemia
susceptibility and pediatric cancer syndromes.
23
AR: autosomal recessive inheritance, AD: autosomal dominant inheritance, AD Missense:
autosomal dominant inheritance of missense variants, X-linked: inherited in an X-linked
fashion, mosaic: only found in mosaic state
Supplementary Table 32. Constitutional variants identified in the cohort.
This table lists the variants found in known cancer predisposing genes (as defined in
Supplementary Table 20) in our cohort. Column definition is listed below:
A. Patient: Patient ID
B. Gene: HUGO gene symbol
C. RefSeq: RefSeq accession number
D. AA Change: predicted amino acid change for the variant
E. mRNA change: variant change at cDNA level with the A of the translation initiation codon
designated as +1
F. MT Type: type of mutation. FS: frameshift, MS: missense, NS: nonsense, proteinDel: in-
frame deletion, CSS: canonical splice site
G. Protein Domain: protein domain in which the variant is located as determined with
www.PecCan.stjude.org. No: no domain annotated.
H. phyloP: PhyloP score for evolutionary conservation on nucleotide level
I. CADD PHRED: Combined Annotation Dependent Depletion ranked score
J. Polyphen: amino acid substitution impact according to Polymorphism Phenotyping v2 as
available through the Alamut package
K. SIFT: amino acid substitution impact according to the degree of conservation of amino
acid residues as available through the Alamut package
L. Mutation taster: estimation of the impact of the variant on the gene product / protein as
available through the Alamut package
M. Alamut splicing: effect of the variant on splicing as available through the Alamut package
N. 2nd hit in tumor: indicates the presence of an additional aberration in the gene in the
tumors.
O. EXAC AF: allele frequency of the variant in the ExAc database
P. ClinVar: presence of the variant in the ClinVar database. VUS: variant of unknown
significance, no: not present.
Q. HGMD: presence of the variant in the HGMD database. no: not present.
R. Syndrome: syndrome associated with the mutated gene
S. Inheritance: mode of inheritance for the associated syndrome. AR: autosomal recessive
inheritance, AD: autosomal dominant inheritance, X-linked: inherited in an X-linked fashion
T. MAF (%): variant allele frequency in %24
U. MT rds: number of mutant reads
V. TOT rds: total number of reads
W. Quality: GATK quality score
25
SUPPLEMENTARY FIGURES
Supplementary Figure 1. Capture Validation and WES/WGS calling are highly
concordant.
a, For most of the variants (90%), capture validation (CapVal) and WES/WGS agree on the
variant calling. D, diagnosis; R1, first relapse; R2, second relapse; S, second tumor. b,
Sequencing metrics between the two techniques for the agreed variants calls (dark blue in
a). The total reads number and the mutant reads per variant are higher for the CapVal, but
the mutant allele frequency (MAF) for both techniques is comparable for the variants. The
box and whiskers plots depict the median (black line), 25th and 75th percentiles (box
borders) and the highest/lowest value that is within 1.5*IQR (inter-quartile range; whiskers).
Data plotted as individual points beyond the end of the whiskers are outliers. c, For a total of
1,466 variants CapVal and WES/WGS techniques differed on the calling category based on
MAF. Most somatic mutations were called with MAF around 30% (indicated in red dashed
line), which was used as a cut off for subclonal vs heterozygous (clonal) state; a few hover
around the hetero-homozygous cut off of 75% (indicated in blue dashed line). d, Somatic
mutations in diagnosis samples missed in WES/WGS and rescued by CapVal. The analysis
was done in cases with both diagnosis and first relapse samples sequenced by CapVal. With 26
a minimum number (≥3) of mutant (mut.) reads support for a mutation, 3.4% of the somatic
mutations were recovered by CapVal in the diagnosis samples.
27
Supplementary Figure 2. SNV/indel and CNA mutation burden.
a, Histogram of the number of SNV/indels (top) and the relative contribution of the mutation
types (bottom) in each sample sorted from left to right by the total number of SNV/indels. The
coding variants are indicated by shades of red, whereas the non-coding variants are
indicated in shades of blue. b, Histogram of the number of CNAs (top) and the relative
28
contribution of the mutation types (bottom) in each sample sorted from left to right by the total
number of CNAs. c, Missense, silent and 3-prime UTR mutations are most frequent
SNV/indels found in this study. d, Deletions are the most common CNA in this study. Amp:
amplification – gain of >2 copies, hgain: 2 copy gain, gain: 1 copy gain, aUPD: acquired
uniparental disomy – copy number neutral loss of heterozygosity, del: 1 copy deletion, hdel:
2 copy deletion. The box and whiskers plots depict the median (black line), 25th and 75th
percentiles (box borders) and the highest/lowest value that is within 1.5*IQR (inter-quartile
range; whiskers). Data plotted as individual points beyond the end of the whiskers are
outliers.
29
30
Supplementary Figure 3. Mutation burden increases with disease progression.
a, The number of all SNV/indels (left), of clonal (MAF>30%; middle), and clonal non-silent
SNV/indels (right) increase from diagnosis to the subsequent relapses. This effect remains
when the outliers are removed (data not shown). ** p<0.0001, * p<0.001. b, The SNV/indel
mutation burden is not influenced by the type of mutation, tumor lineage, gender, age at
diagnosis, remission time and vital status of the patient. c, The number of CNAs increase in
second relapse (* p<0.05). This effect is not influenced by tumor lineage, gender, age at
diagnosis, remission time and vital status of the patient. The box and whiskers plots depict
the median (black line), 25th and 75th percentiles (box borders) and the highest/lowest value
that is within 1.5*IQR (inter-quartile range; whiskers). Data plotted as individual points
beyond the end of the whiskers are outliers. D, diagnosis; R1, first relapse; R2, second
relapse; S, second tumor.
31
Supplementary Figure 4. Cut-off determination for hypermutated samples.
a, Histogram of the number of SNV/indels in all samples. b, Histogram of the number of
CNAs in all samples. Red arrows indicate the cut-off number of SNV/indels (N=85) and CNA
(N=50) used for defining hypermutators. c, Hypermutator distribution in different disease 32
stages. The ratio of hypermutated samples was relatively high in the second relapses. d, The
number of SNV/indels detected in each sample categorized by subtype of ALL at diagnosis
and colored by disease stage, showing variable number of mutations across subtypes. The
horizontal dotted line indicates the cut-off of >85 SNV/indels. Box and Whisker plots in the
background indicate the median (thick line), 25th and 75th percentiles (box borders) and the
highest/lowest value that is within 1.5*IQR (inter-quartile range; whiskers) number of
SNV/indels. The ALL cases are grouped into well-defined disease subtypes, which include
DUX4-rearranged (DUX4), ETV6-RUNX1, hyperdiploid, hypodiploid, KMT2A (MLL) -
rearranged, BCR-ABL1 (Ph), Ph-like, other B-ALL subtypes, early T-cell precursor ALL (ETP)
and T-lineage ALL non-ETP (T-ALL). e, Histogram of the number of SNV/indels (top) and the
relative contribution of the mutation types (bottom) in each hypermutator (>85 SNV/indels)
sorted from left to right by the total number of SNV/indels. The coding variants are indicated
by shades of red, whereas the non-coding variants are indicated in shades of blue. f, Histogram of the number of CNAs (top) and the relative contribution of the mutation types
(bottom) in each CNA hypermutator (>50 CNAs) sorted from left to right by the total number
of CNAs.
33
34
Supplementary Figure 5. Signaling pathway and Epigenomic mutations in ALL at
diagnosis and relapse.
a, Genes involved in signaling pathways with somatic mutations identified in diagnosis (D)
and/or first available relapse (R) sample per case. b, Epigenetic regulators with somatic
mutations identified in diagnosis (D) and/or first available relapse (R) sample per case are
grouped into 6 epigenetic categories(19). Same mutation annotation schema is applied as
Figure 1a.
35
Supplementary Figure 6. Transcription factor/regulator mutations in ALL at diagnosis
and relapse.
Genes with somatic mutations identified in diagnosis (D) and/or first available relapse (R)
sample per case are grouped into transcription factors and transcriptional regulators. Same
mutation annotation schema is applied as Figure 1a.
36
Supplementary Figure 7. Frequently mutated genes in relapsed ALL.
a, ProteinPaint visualizations of the most frequently clonally mutated genes by non-silent
SNV/indels. b, ProteinPaint visualizations of three genes that were exclusively affected by
clonal non-silent SNV/indels in the tumors of more than 5 patients in our cohort.
37
Supplementary Figure 8. Mutational landscape of xenografts resolves clonal structure.
a, Elevated somatic mutation detection sensitivity in xenografted leukemia samples. To
indicate the sensitivity of mutation calling, the highest MAF in the xenografts generated from
the diagnosis sample (XDHM) was picked for each somatic mutation. Mutations showing
increased MAF from D to XDHM are linked by lines. The mutations showing significant
increment (MAF<0.05 in D and ≥0.05 in XDHM) are highlighted in blue and the mutations
missed in D (<3 mutant reads) but rescued (≥3 mutant reads) in XDHM sample are 38
highlighted in red. D: diagnosis patient sample, R1: first relapse patient sample. b, Clonal
evolution model of SJETV010 inferred from MAF of somatic mutations detected from primary
patient samples and xenografted samples. In this case, only diagnosis and second relapse
(R2) samples were available for transplantation. Non-silent mutations in cancer genes
(COSMIC Cancer Gene Census) in each clone are listed. The colors indicate whether
mutations detected in bulk samples could be denoted as a parental clone (shared by D, R1
and R2), a D unique clone, an R2 unique clone and two relapse shared clones. c,
Distribution of somatic mutations’ MAF in D, R1, R2 and xenografted leukemia samples from
SJETV010. The color scheme for different clones is the same as in b. Without xenografts’
information, clones 3 and 4 and the clones 5-13 would be indistinguishable. D.*.#, samples
originating from D; R.*.#, originating from R2, *.BM.#, samples collected from bone marrow,
*.CNS.#, collected from central nervous system; *.SP.#, collected from spleen.
39
40
Supplementary Figure 9. Validation of digital droplet PCR (ddPCR) assay.
a, Test sensitivity for the NT5C2 p.R39Q, CREBBP p.R1446C and NRAS p.G12R assays
was determined using serial dilutions of patient derived leukemic cells containing the
mutation with REH cells that were wild type for all mutations tested. The R-squared
coefficient was 0.9994, 0.9987 and 0.9993 respectively. b, A total of 9 samples were
measured twice with high correlation (r = 0.9997). The four different assays are indicated by
colors. c, Representative images of the ddPCR output. Depicted are the results of the
CREBBP p.R1446C assay for 2 tumor samples and 2 remission samples from patient
SJBALL013, plus the REH wild type control and the no template control (NTC). Wild type
droplets are labeled with the VIC fluorescent reporter dye and droplets containing mutant
PCR products are labeled with FAM fluorescent dye. The gates used for counting the wild
type (WT) and mutant (MUT) droplets are indicated, as well as the number of droplets within
these gates. T indicates the time point of the sample in number of days after diagnosis. d,
The MAF as measured by WES (left) or capture validation (CapVal, right) correlates well with
the values measured by ddPCR. A total of 11 samples were found negative (MAF=0%) on
the listed mutations in WES and/or CapVal, but mutant droplets were detected with ddPCR.
One sample was positive in WES (SJTALL001 KRAS p.G12C MAF=3.2%), but measured
0% in both CapVal and ddPCR (Supplementary Table 23).
41
42
Supplementary Figure 10. Mutational signatures.
a, Comparison between the de novo extracted signatures and known COSMIC signatures.
Signature A resembles signature SBS1, characterized by spontaneous deamination of
methylated cytosines in CpG context (red bar). Signature B closely resembles signatures
SBS2 and SBS13, both AID/APOBEC associated signatures (orange bar). Signature C
clusters close to signatures SBS6, SBS15, and SBS20, all signatures associated with
mismatch repair (MMR; blue bar). Signature D shows highest resemblance to a group of
signatures with less distinct characteristics. b, Cosine similarity heatmap of the complete set
of acquired mutations per sample relative to involved COSMIC and de novo-extracted
signatures (based on clusters identified in panel a) shows clustering of samples in the four
groups presented in Figure 6. Clusters are indicated by colored bars as in panel a. Samples
marked by a grey bar all have a major contribution of the less distinctive mutational profile D.
c, De novo extracted mutational signatures using WGS data. Signature profiles are highly
concordant with signatures extracted from WES data. d, De novo extracted somatic
mutational signature A using WGS data (top panel) and signature SBS1 (bottom panel)
according to the 96-substitution classification. Signature A is hallmarked by a higher relative
number of ApCpG, CpCpG and GpCpG compared to TpCpG. e, Total mutation profiles
(containing all substitutions) of SJETV10-R2 and SJHYPER022-R1 based on WGS data. f, Absolute contribution of each of the four signatures (WGS data) to each of 26 diagnosis and
relapse samples. Composition of the four signatures in individual samples was highly
concordant with the composition obtained from WES data. g, Relative contribution of C>T
transitions at CpGs in transcribed and untranscribed strand of four samples with a prominent
signature A contribution. h, Absolute contribution of C>T transitions at CpGs in the leading
and lagging strands (top panels) and CpG>TpG mutation density in early, intermediate and
late replicated regions of two hypermutated relapses and three healthy colon organoids.
43
Supplementary Figure 11. Neoepitope analysis of somatic missense SNVs.
a, Number of predicted HLA-binding mutant peptides (<=500nM) per tumor correlates
significantly with missense mutation burden (p < 0.001). Correlation efficiency (r2) of the fitted
functions are shown in hypermutators (HPM) and non-HPM samples, respectively. Different
disease stages, including diagnosis (D), first relapse (R1), and second relapse (R2), are
shown in different colors. The number of predicted binders also varies significantly as an
effect of HPM status (p<0.001), with significant interactions between HPM status and
mutation burden (p<0.001) and between disease progression and mutation burden
(p=0.0452). b, the number of predicted HLA-binding mutant peptides (<=500nM) per tumor
varies as a function of disease stage (D–R1–R2; p<0.001) and HPM status (p<0.001), with
more putative binders evident in hypermutated tumors and as disease progresses.
Differences in the number of binders per tumor owed to effects of disease (B- and T-ALL),
though visually apparent, did not reach the threshold for statistical significance (p=0.0511).
No significant interaction between disease progression and HPM status was identified. c,
Distribution of WPAS (Weighted Putative Antigenicity Score) scores across disease
progression using a subset of SNVs with available expression data. WPAS varies as an
effect of disease progression (p=0.015), with median WPAS highest at the R2 stage
(suggesting more antigenic variants in tumors from prolonged disease progression). d, 44
Distribution of WPAS scores categorized by COSMIC cancer gene status in different disease
stages. WPAS again varies as an effect of disease progression (as in c) but also according
to Cosmic cancer gene status categories (p=0.005), with Cosmic genes showing overall
higher putative antigenicity. All P values were calculated by the ANOVA test.
45
FIGURES
Figure 1. Somatic mutation spectrum in ALL at diagnosis and relapse.Figure 2. Patterns of relapse in ALL.Figure 3. CNAs and MEF2D-BLC9 rearrangements in patient SJBALL006.Figure 4. Integration of mutational landscape and xenografts resolves clonal structure in ALL.Figure 5. Digital droplet PCR reveals mutations at low levels in intermediate complete remission samples.Figure 6. Mutational signature analysis of hypermutated relapses identifies multiple distinct mutational processes in hypermutation.
46
REFERENCES
1. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res 2015;43:D805-11
2. Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic acids research 2017;45:D777-D83
3. Mullighan CG, Miller CB, Radtke I, Phillips LA, Dalton J, Ma J, et al. BCR-ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros. Nature 2008;453:110-4
4. Mullighan CG, Su X, Zhang J, Radtke I, Phillips LA, Miller CB, et al. Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. The New England journal of medicine 2009;360:470-80
5. Zhang J, McCastlain K, Yoshihara H, Xu B, Chang Y, Churchman ML, et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nature genetics 2016;48:1481-9
6. Alexander TB, Gu Z, Iacobucci I, Dickerson K, Choi JK, Xu B, et al. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 2018
7. Waanders E, Hebeda KM, Kamping EJ, Groenen PJ, Simons A, Hoischen A, et al. Independent development of lymphoid and histiocytic malignancies from a shared early precursor. Leukemia 2016;30:955-8
8. Weng AP, Ferrando AA, Lee W, Morris JPt, Silverman LB, Sanchez-Irizarry C, et al. Activating mutations of NOTCH1 in human T cell acute lymphoblastic leukemia. Science 2004;306:269-71
9. Ge Z, Li M, Zhao G, Xiao L, Gu Y, Zhou X, et al. Novel dynamin 2 mutations in adult T-cell acute lymphoblastic leukemia. Oncology letters 2016;12:2746-51
10. Chao MM, Todd MA, Kontny U, Neas K, Sullivan MJ, Hunter AG, et al. T-cell acute lymphoblastic leukemia in association with Borjeson-Forssman-Lehmann syndrome due to a mutation in PHF6. Pediatric blood & cancer 2010;55:722-4
11. Van Vlierberghe P, Palomero T, Khiabanian H, Van der Meulen J, Castillo M, Van Roy N, et al. PHF6 mutations in T-cell acute lymphoblastic leukemia. Nature genetics 2010;42:338-42
12. Narumi S, Amano N, Ishii T, Katsumata N, Muroya K, Adachi M, et al. SAMD9 mutations cause a novel multisystem disorder, MIRAGE syndrome, and are associated with loss of chromosome 7. Nat Genet 2016;48:792-7
13. Schwartz JR, Wang S, Ma J, Lamprecht T, Walsh M, Song G, et al. Germline SAMD9 mutation in siblings with monosomy 7 and myelodysplastic syndrome. Leukemia 2017;31:1827-30
14. Chen DH, Below JE, Shimamura A, Keel SB, Matsushita M, Wolff J, et al. Ataxia-Pancytopenia Syndrome Is Caused by Missense Mutations in SAMD9L. Am J Hum Genet 2016;98:1146-58
15. Waanders E, Scheijen B, Jongmans MC, Venselaar H, van Reijmersdal SV, van Dijk AH, et al. Germline activating TYK2 mutations in pediatric patients with two primary acute lymphoblastic leukemia occurrences. Leukemia 2017;31:821-8
16. Shah S, Schrader KA, Waanders E, Timms AE, Vijai J, Miething C, et al. A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia. Nature genetics 2013;45:1226-31
17. Auer F, Ruschendorf F, Gombert M, Husemann P, Ginzel S, Izraeli S, et al. Inherited susceptibility to pre B-ALL caused by germline transmission of PAX5 c.547G>A. Leukemia 2014;28:1136-8
18. Russell LJ, Capasso M, Vater I, Akasaka T, Bernard OA, Calasanz MJ, et al. Deregulated expression of cytokine receptor gene, CRLF2, is involved in lymphoid
47
transformation in B-cell precursor acute lymphoblastic leukemia. Blood 2009;114:2688-98
19. Huether R, Dong L, Chen X, Wu G, Parker M, Wei L, et al. The landscape of somatic mutations in epigenetic regulators across 1,000 paediatric cancer genomes. Nature communications 2014;5:3630
48