9
1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg 1 , Gad Getz 2 , Matthew Meyerson 2,3,4,5 , Leonid Mirny 2,6,7,* Author Affiliations: 1 Harvard University, Program in Biophysics, Boston, Massachusetts. 2 The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 3 Harvard Medical School, Boston, MA 02115, USA. 4 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA. 5 Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA 02115, USA. 6 Harvard-MIT, Division of Health Sciences and Technology. 7 Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA * Corresponding author Supplemental Figures B: Correlation between SCNA and HiC heatmaps All pval Amplifications pval Deletions pval chr1 0.6105 <.001 0.5293 <.001 0.4136 <.001 chr2 0.4513 <.001 0.3458 <.001 0.3632 <.001 chr3 0.5388 <.001 0.4706 <.001 0.3662 <.001 chr4 0.5758 <.001 0.4908 <.001 0.3867 <.001 chr5 0.5641 <.001 0.5115 <.001 0.3678 <.001 chr6 0.5937 <.001 0.5236 <.001 0.3994 <.001 chr7 0.4676 <.001 0.4079 <.001 0.3095 <.001 chr8 0.6270 <.001 0.5550 <.001 0.4614 <.001 chr9 0.3974 <.001 0.4735 <.001 0.2164 <.001 chr10 0.4914 <.001 0.4200 <.001 0.3567 <.001 chr11 0.5070 <.001 0.4259 <.001 0.4202 <.001 chr12 0.5574 <.001 0.5005 <.001 0.3897 <.001 chr13 0.6546 <.001 0.6231 <.001 0.4173 <.001 chr14 0.4974 <.001 0.3721 <.001 0.4832 <.001 chr15 0.5414 <.001 0.4560 <.001 0.4103 <.001 chr16 0.3905 <.001 0.3113 <.001 0.3222 <.001 chr17 0.5480 <.001 0.4646 <.001 0.4196 <.001 chr18 0.5892 <.001 0.5102 <.001 0.4356 <.001 chr19 0.5690 <.001 0.4749 <.001 0.4185 <.001 chr20 0.5961 <.001 0.5332 <.001 0.3786 <.001 chr21 0.5198 <.001 0.4633 <.001 0.3467 <.001 chr22 0.6771 <.001 0.6248 <.001 0.4439 <.001 Table S1: Pearson’s r and p-values for HiC vs. SCNA data (p-values from unrestricted permutation of heatmaps). Nature Biotechnology: doi:10.1038/nbt.2049

Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

1

Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1, Gad Getz2, Matthew Meyerson2,3,4,5, Leonid Mirny2,6,7,*

Author Affiliations: 1 Harvard University, Program in Biophysics, Boston, Massachusetts. 2 The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 3 Harvard Medical School, Boston, MA 02115, USA. 4 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA. 5 Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA 02115, USA. 6 Harvard-MIT, Division of Health Sciences and Technology. 7 Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA * Corresponding author

Supplemental Figures

B: Correlation between SCNA and HiC heatmaps

All pval Amplifications pval Deletions pval chr1 0.6105 <.001 0.5293 <.001 0.4136 <.001 chr2 0.4513 <.001 0.3458 <.001 0.3632 <.001 chr3 0.5388 <.001 0.4706 <.001 0.3662 <.001 chr4 0.5758 <.001 0.4908 <.001 0.3867 <.001 chr5 0.5641 <.001 0.5115 <.001 0.3678 <.001 chr6 0.5937 <.001 0.5236 <.001 0.3994 <.001 chr7 0.4676 <.001 0.4079 <.001 0.3095 <.001 chr8 0.6270 <.001 0.5550 <.001 0.4614 <.001 chr9 0.3974 <.001 0.4735 <.001 0.2164 <.001 chr10 0.4914 <.001 0.4200 <.001 0.3567 <.001 chr11 0.5070 <.001 0.4259 <.001 0.4202 <.001 chr12 0.5574 <.001 0.5005 <.001 0.3897 <.001 chr13 0.6546 <.001 0.6231 <.001 0.4173 <.001 chr14 0.4974 <.001 0.3721 <.001 0.4832 <.001 chr15 0.5414 <.001 0.4560 <.001 0.4103 <.001 chr16 0.3905 <.001 0.3113 <.001 0.3222 <.001 chr17 0.5480 <.001 0.4646 <.001 0.4196 <.001 chr18 0.5892 <.001 0.5102 <.001 0.4356 <.001 chr19 0.5690 <.001 0.4749 <.001 0.4185 <.001 chr20 0.5961 <.001 0.5332 <.001 0.3786 <.001 chr21 0.5198 <.001 0.4633 <.001 0.3467 <.001 chr22 0.6771 <.001 0.6248 <.001 0.4439 <.001

Table S1: Pearson’s r and p-values for HiC vs. SCNA data (p-values from unrestricted permutation of heatmaps).

Nature Biotechnology: doi:10.1038/nbt.2049

Page 2: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

2

A: Model Fits Chromosome-by-Chromosome

Figure S1: fits on chromosome-by-chromosome basis

A: Model fitting chromosome-by-chromosome via Poisson Log-likelihood for SCNAs that do not span highly-recurrent SCNA regions listed in 1. BIC penalized LLR for indicated models vs. model where two ends of an alteration are drawn randomly from the same chromosomal arm and experience purifying selection (Uniform+sel). Error bars for the fits obtained via bootstrapping, and respectively represent the range from 5th to 95th percentile. For long chromosomes, FG alteration probability ~ 1/L clearly fits better than purifying selection, Uniform+sel, on a chromosome-by-chromosome basis.

B: Parameter values (in Mb) for best fitting exponential distribution when purifying selection is fit on a chromosome-by-chromosome basis. Top (black squares): simple purifying selection (Uniform+sel). Bottom (magenta: squares): purifying selection and mutation rate from fractal globule (FG +sel). Error bars obtained via bootstrapping. Parameter values for purifying selection (best fitting exponential distribution) are proportional to chromosome length for both simple purifying selection, Uniform+sel and FG +sel. This is also true HiC +sel. As smaller paramter values indicate stronger purifying selection, the relationship between parameter values and chromosome length suggests that shorter, more gene rich chromosomes, may experience greater purifying selection.

Nature Biotechnology: doi:10.1038/nbt.2049

Page 3: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

3

B: Gene-Based Purifying Selection & Cancer Associated Genes

Figure S2. For each model, BIC-corrected log-likelihood for the set of non-recurrent SCNAs is shown. Comparison of gene-based purifying selection (triangles) with length-based purifying selection (squares). Each model is considered with and without purifying selection. Thus HiC w/o (-) purifying selection is the same for gene-based or length-based purifying selection. Y-axis shows log-likelihood Ratios for SCNA data vs. Uniform. Across model types, the best-fit parameter for length-dependent purifying selection fits the data better than purifying selection acting on the number of genes affected by a SCNA. Error bars obtained via bootstrapping: symbol represents the median, bar ends represent the 5th and 95th percentiles.

Figure S3: Permutation tests for amplifications and deletions. A: distribution of log-likelihood ratios for randomly permuted SCNAs given HiC vs. observed SCNAs across all chromosomes, separated into results for the 16,521 amplifications (top) and 7,789 deletions (bottom). Observed amplifications fit better by HiC contact probability with p<.05, observed deletions are fit better with p<.001. B: Distributions of same log-likelihood ratios for individual chromosomes (22 autosomes) vs. observed SCNAs (blue line). Squares represent median values, error bars respective represent the range from 5th to 25th percentile and 75th to 95th percentile. On average, the probability of the observed deletions given HiC is higher than permuted deletions for each chromosome except chromosome 11.

Nature Biotechnology: doi:10.1038/nbt.2049

Page 4: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

4

C: Fitting FG + Domains

Figure S4. Considering domains provides little improvement beyond the FG model for fitting the landscape of SCNAs . A: BIC-corrected log-likelihood ratios are shown for FG, FG+domains, FG+sel, FG+sel+domains models vs. FG model for SCNAs which do not overlap GISTIC peaks of recurrent SCNA (as in Fig. 3) but excluding chromosomes 4, 5 and 15. Models were considered with (+) and without (-) fitting for domains selection. Fitting a multiplicative model for domains provides little improvement beyond the FG model, and much less than considering purifying selection. Domain strengths for the best-fitting FG+domains model are: open-open 1.13, closed-closed 1.20. B: Permutation analysis of the best-fitting FG+domains model. Observed SCNAs are better fit by FG+domains with p<.001 (blue arrow). However, the magnitude of this preference is much less than for the permutation test of HiC (~150 vs. ~420). This indicates that much of the the position-specific information in HiC relevant for SCNAs is not captured by this multiplicative model of domains.

Nature Biotechnology: doi:10.1038/nbt.2049

Page 5: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

5

Figure S5. Domains for Amplifications and Deletions A: As in Figure S4, but separately for amplifications and deletions. BIC-corrected log-likelihood ratios are shown for FG, FG+domains, FG+sel, FG+sel+domains models vs. FG model for SCNAs which do not overlap GISTIC peaks of recurrent SCNA (as in Fig. 3) but excluding chromosomes 4, 5 and 15. Models were considered with (+) and without (-) fitting for domains selection. Fitting a multiplicative model for domains provides little improvement beyond the FG model, and much less than considering purifying selection. Domain strengths for the best-fitting FG+domains model are: open-open 1.10, closed-closed 1.15. B: Permutation analysis of the best-fitting FG+domains model. Observed SCNAs are not fit better by FG+domains (blue arrow). This indicates that the position-specific information of this model is not relevant for observed amplifications. C: BIC-corrected log-likelihood ratios are shown for FG, FG+domains, FG+sel, FG+sel+domains models vs. FG model for SCNAs which do not overlap GISTIC peaks of recurrent SCNA (as in Fig. 3) but excluding chromosomes 4, 5 and 15. Models were considered with (+) and without (-) fitting for domains selection. Fitting a multiplicative model for domains provides little improvement beyond the FG model, and much less than considering purifying selection. Domain strengths for the best-fitting FG+domains model are: open-open 1.21, closed-closed 1.31. D: Permutation analysis of the best-fitting FG+domains model. Observed SCNAs are better fit by FG+domains with p<.001 (blue arrow).

Nature Biotechnology: doi:10.1038/nbt.2049

Page 6: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

6

D: Permutations by cell-type (cancer lineage)

Figure S6: Permutation tests by cancer lineage (cell type of origin) Distribution of log-likelihood ratios for randomly permuted SCNAs given HiC vs. observed SCNAs across all chromosomes, for all SCNAs (top), amplifications (middle) and deletions (bottom). Cancers are separated into epithelial, heme, sarcoma, and neural lineages as indicated in Supplemental Figure 7 in 1. Deletions are significant across all cancer subtypes.

Nature Biotechnology: doi:10.1038/nbt.2049

Page 7: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

7

E: Amplifications vs. Deletions

Figure S7. Poisson Log-likelihood Ratios for SCNA data.

A: Amplifications: model log-likelihood for 16,521 observed amplification SCNAs in the non-recurrent set vs. Uniform. B: Deletions: model log-likelihood for 7,789 observed deletion SCNAs in the non-recurrent set vs. Uniform. For both amplifications and deletions, The following six models are considered: Uniform, Uniform+sel, HiC, HiC+sel, FG, FG+sel. HiC model assumes mutation rates proportional to experimentally measured contact probabilities, while FG model assumes mutation rates proportional to contact probability in a fractal globule architecture (~1/L). Left y-axis presents BIC-corrected log-likelihood ratio for each model vs. Uniform model. Each model was considered with (+) and without (-) purifying selection. Error bars were obtained via bootstrapping: square represents the median, bar ends represent the 5th and 95th percentiles. The FG model significantly outperforms other mutational models of SCNA formation for amplifications and deletions. However, Uniform+sel does not outperform HiC for deletions and FG+sel does not fit significantly better than FG for deletions. We note that since there are more amplifications than deletions, the aggregate likelihood ratios vs. the Uniform model are greater for amplifications. The relatively poorer performance of HiC for amplifications may reflect additional selective or mutational pressures acting on amplifications vs. deletions. Variable mutation rates at the megabase-scale could obscure a relationship with megabase details of chromatin structure.

Nature Biotechnology: doi:10.1038/nbt.2049

Page 8: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

8

F. Limits of SCNA Resolution

Figure S8: Resolution of SCNA

In 1, only SCNAs covering at least 7 probes were reported. Since SNP arrays are not evenly spaced across the genome, depending upon the region of the genome, 7 probes span anywhere from a kilobase to a megabase. Dashed line indicates that the SCNAs considered were those larger than 1Mb.

G: Results are robust to choice of SCNAs

Figure S9. Model selection and permutation analysis are robust to choice of SCNAs A: Same as Figure 3, but including SCNAs which span significant GISTIC peaks of recurrent SCNA (39,568 SCNAs, 26,022 amplifications and 13,546 deletions). Left y-axis presents BIC-corrected log-likelihood ratio for each model vs. Uniform model. Each model was considered with (+) and without (-) purifying selection. Right y-axis shows the same data as a fold difference in likelihood per cancer

Nature Biotechnology: doi:10.1038/nbt.2049

Page 9: Fudenberg cancerSCNA supplementary 110911...1 Supplementary Information High-order chromatin architecture determines the landscape of chromosomal alterations in cancer Geoff Fudenberg1,

9

specimen (sample) vs. Uniform. Error bars were obtained via bootstrapping: square represents the median, bar ends represent the 5th and 95th percentiles. The FG model significantly outperforms other mutational models of SCNA formation, and every model is significantly improved when purifying selection is taken into account. B: (top row) same as Figure 4, but including SCNAs that span significant GISTIC peaks of recurrent SCNA. (middle and bottom row) same as Figure S3. Left column shows distribution of log-likelihood ratios for randomly permuted SCNAs given HiC vs. observed SCNAs given HiC aggregated over all 22 autosomes. Observed SCNAs are indicated with a blue arrow. Right column shows the distributions for individual chromosomes. Squares represent median values, error bars respective represent the range from 5th to 25th percentile and 75th to 95th percentile.

Nature Biotechnology: doi:10.1038/nbt.2049