65
Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome Sciences and Systems Biology Washington University Dragon Star 2012 Changchun, China July 2, 2012

Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

  • Upload
    trantu

  • View
    240

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Modern Epigenomics

Histone Code

Ting Wang Department of Genetics

Center for Genome Sciences and Systems Biology Washington University

Dragon Star 2012 Changchun, China

July 2, 2012

Page 2: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

DNA methylation +

Histone modification

Chromatin

Page 3: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

- 2 each of histones:

H2A,H2B, H3 and H4

Chromatin DNA plus Protein in cells with nuclei

146 bp of DNA

Nucleosome

Page 4: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

The Nucleosome core particle

Nucleosome

H3

H4

Page 5: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

h"p://www.nature.com/nsmb/journal/v14/n11/images/nsmb1337-­‐F1.gif  

Post-translational Histone Modifications

Page 6: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Post-translational Histone Modifications

H3 tail Modifications:

Active

HDACs HATs

=Acetylation

=Methylation

KMTases Repressive

Page 7: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Li  e.  al.  (2007)  Cell  128,  707  

Page 8: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Li  e.  al.  (2007)  Cell  128,  707  

Histone Modifications in Relation to Gene Transcription

Page 9: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

DNA methylation mediated repression

Page 10: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Repression independent of DNA methylation

H3K9 methylation

“condensed” chromatin

Page 11: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

H3K27 methylation mediated repression

1.  H3K27 methylation

2.  DNA methylation

Page 12: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Mechanisms of Epigenetic Crosstalk

Page 13: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

“Epigenetic cancer therapy”

Page 14: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

DNA-methylation and HDAC inhibitors in clinical trials

Page 15: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Summary

•  Dnmt1, Dnmt3A, Dnmt3b - the mammalian DNMTs

•  Chromatin structure is influenced by covalent modification of histone tails

•  Multiple chromatin modification pathways involved in silencing of genes which may show “crosstalk” with DNA methylation

Page 16: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Technologies for Interrogating Chromatin States

ChIP-chip

Antibody specific to one type of histone modification

Histone Modifications

ChIP-seq

Deep sequencing

Page 17: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Chromatin-IP Sequencing K4me1  K4me2  K4me3  K27me3   “repressive”  

“acKve”  

Page 18: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

K4me3  

K27me3  

Silent developmental gene Transcribed gene

K9me3  

K20me3  

Constitutive heterochromatin

FoxP1 Olig1

Olig1

K4me3  

K27me3  

‘Poised’ developmental gene

Histone methylation and transcriptional state

Page 19: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Predicting non-coding RNA?

•  From sequence? –  Not clear which properties can be exploited –  Sequence features such as promoters are too

weak •  Histone modifications + conservation worked

Page 20: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome
Page 21: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Nucleosome Positioning from Histone ChIP-seq

•  Barski et al, Cell 2007 –  Nucleosome resolution ChIP-seq of 21 histone

marks in CD4+ T-cells –  Total 185.7 M

25 nt tags sequenced –  Analysis not at

nucleosome resolution to map nucleosomes at specific regions

Antibody for

MNase digest

Page 22: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Combine Tags From All ChIP-Seq

Page 23: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Extend Tags 3’ to 150 nt Check Tag Count Across Genome

Page 24: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Take the middle 75 nt

Page 25: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Inaccessible Inaccessible

Accessible

Precise delineation of the accessible regulatory DNA compartment

Digital DNaseI profiling

Page 26: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Digital DNaseI profiling: direct access to regulatory sequences

Page 27: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

ChromHMM

Transcription Start Site Enhancer DNA

Observed chromatin marks. Called based on a Poisson distribution

Most likely Hidden State

Transcribed Region

1 6 53 4 6 6 6 6 5

1:

3:

4:

5:

6:

5High Probability Chromatin Marks in State

2:

0.8

0.9

0.9

0.8 0.7

0.9

200bp intervals

All probabilities are learned from the data

2

K4me3 K36me3 K36me3 K36me3 K36me3 K4me1 K4me3 K4me1

K27ac

0.8

K4me1

K36me3

K27ac

K4me1 K4me3

K4me3

K4me1 K4me1

Page 28: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

ChromHMM

Page 29: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

29  

Prom

oter  

Tran

scribe

d  Ac1ve  intergen

ic  

Repe

11ve  R

epressed

 

ChromaKn  Marks  from  (Barski  et  al,  Cell  2007;  Wang  et  al  Nature  GeneKcs,  2008);  DNAseI  hypersensiKvity  from  (Boyle  et  al,  Cell  2008);  Expression  Data  from  (Su  et  al,  PNAS  2004);  Lamina  data  from  (Guelen  et  al;  Naature  2008)  

ApplicaKon  of  ChromHMM  to  41  chromaKn  marks  in  CD4+  T-­‐cells  (Barski’07,  Wang’08)  

Page 30: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Next-gen Sequencing Technology

Page 31: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Forward Genetics

Phenotype

Genotype

Hypothesis

Test Hypothesis By Genetic Manipulation

Page 32: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Forward Genetics

Phenotype

Genotype

Hypothesis

Test Hypothesis By Genetic Manipulation

Two groups: 1. Develop Colorectal cancer At Young Age

2. Do not

Mutation in APC Gene

APC is a Tumor Supressor Gene

Delete APC in Mouse Control: Isogenic APC+

Page 33: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

The Cycle of Forward Genetics

Phenotype

Genotype

Hypothesis

Test Hypothesis By Genetic Manipulation

Observation

?Sequencing?

Thinking

Gene Deletion/Replacement Recombinant Technology

In 2005 $9 million/genome Not feasible

Page 34: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

The Problem with Forward Genetics

Phenotype

Genotype

Hypothesis

Test Hypothesis By Genetic Manipulation

Sequencing

Thinking

Gene Deletion/Replacement Recombinant Technology

Currently $40,000* /genome Cost is rapidly dropping

Sequencing

Page 35: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Past and Current Sequencing Technologies

1992-1999 1999 2003“old fashioned

way”

Pre-1992ABI 373/377 ABI 3700 ABI 3730XL

Fluorescent ddNTPsCapillariesRobotic loadingAutomated base callingReliable*

Fluorescent ddNTPs Capillaries*Robotic loading*Automated base callingBreaks down frequently

S35 ddNTPsGelsManual loadingManual base calling

Fluorescent ddNTPs*GelsManual loadingAutomated base calling*

0 and 1st generation sequencing

Page 36: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Next or 2nd-generation sequencing Next generation sequencing technology

454/Roche GS-20/FLX(Oct 2005)

Illumina/Solexa1G Genetic Analyser (Feb 2007)

ABI SOLiD(Oct 2007)

Page 37: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

A simple comparison of seq. tech. Comparison of Next Generation

Sequencing Technologies

Technology Reads/run Ave read length

bp per Run Data output

~100,000

70 million

1 billion

1 billion

1-2MB

20GB

1.5TB

1.5-3.0TB

3730XL (ABI) 96 900-1200 bp

454 (Roche) 400,000 250-310 bp

Illumina 1G (Solexa)

40 million 36 bp

SoLID (ABI) 88-132 million(44-66 per slide)

35 bp

Page 38: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

They can be applied to different areas Is Sanger sequencing dead? Future of sequencing centers

Next Gen long read instrument

(454)

Next Gen short read instrument

(Solexa)

ABI 3730XL

•Routine sequencing•Verify SNPs from next gen•1X scaffold for novel genomes

“When quantity mattersbut length doesn’t”

“When length matters”

•Novel genomes•Metagenomics

•Expression tags•Chip Seq•Re-sequencing

Page 39: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Illumina Genome Analyzer

Illumina Genome AnalyzerIntroduction to the Technology

Page 40: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

IGA Sequencing Pipeline Illumina Sequencing pipeline2. Cluster generation on flow cell

(1.5 day)1. Sample Prep

(1-5 days)

Ligate adapters Clonal Single molecular Array

4. Data Analysis(days-months) 3. Sequencing and imaging

(2-3 days)

Page 41: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Cluster generation Cluster Generation

8 channels (lanes)

Page 42: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Attach DNA to flow cell Attach DNA to flow cell

Page 43: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Bridge amplification Attach DNA to flow cell

Can we amplify epigenetic mark??

Page 44: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Cluster generation Cluster Generation

Clonal Single molecular Array

Page 45: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Clonal single molecule array Clonal Single molecule Array

Random array of clusters

100um~1000 molecules per ~ 1 um cluster ~20-30,000 clusters per tile~40 M clusters per flowcell

Page 46: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Sequencing by synthesis 5’

G

T

C

A

G

T

C

A

G

T

C

A

GT

3’

Cycle 1: Add sequencing reagents

First base incorporated

Remove unincorporated bases

C

A

G

T

C

A

T

C

A

C

C

T

AG

CG

T

A

Detect signal

Cycle 2-n: Add sequencing reagents and repeat

Sequencing By Synthesis (SBS)

5’

Page 47: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Base calling from images Base Calling From Images

1 2 3 7 8 94 5 6

T T T T T T T G T …

T G C T A C G A T …

The identity of each base of a cluster is read off from sequential images

Reversible terminator chemistry solves homopolymer problem

Page 48: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

IGA without cover

Page 49: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Flow cell imaging Flowcell imaging

Page 50: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

A flow cell A flow cell contains eight lanes Lane 1

Lane 2

Lane 8

.

.

.

Each lane/channel contains three columns of tiles

Column 1

Column 2

Column 3

TileEach column contains 100 tiles

Each tile is imaged four times per cycle – one image per base.

20K-30KClusters 345,600 images for a 36-cycle run

350 X 350 µm

Page 51: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Data analysis pipeline Data Analysis Pipeline

intensity files

Firecrest Bustard

tiff image files (345,600) Sequence files

ElandAdditionalData Analysis Alignment to Genome

Page 52: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Applications Applications of the Technology

Gene ExpressionWhole GenomeRe-sequencing

TargetedRe-sequencing ChIP Sequencing

Other Applications MicroRNA discovery

Page 53: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Read Length is Not As Important For Resequencing

Page 54: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Applications •  Genomes •  Re-sequencing Human Exons (Microarray capture/amplification) •  small (including mi-RNA) and long RNA profiling (including splicing) •  ChIP-Seq:

•  Transcription Factors •  Histone Modifications •  Effector Proteins

•  DNA Methylation •  Polysomal RNA •  Origins of Replication/Replicating DNA •  Whole Genome Association (rare, high impact SNPs) •  Copy Number/Structural Variation in DNA •  ChIA-PET: Transcription Factor Looping Interactions •  ???

Page 55: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Functional Genomics Data Analysis •  Map reads to the genome

•  Available Tools •  MAQ •  SOAP •  MOSAIK •  BWA •  BOWTIE

•  Determine the target genome sequence (i.e., repeat classes) •  Mapping options

•  Number of allowed mis-matches (as function of position) •  Number of mapped loci (e.g., 1 = unique read sequence)

•  Generate Consensus Sequence and identify SNPs •  Generate Read Enrichment Profile (e.g., Wald Lab tool) •  Develop Null Model and Calculate Significantly Enriched Sites •  High level analysis: compare to annotations, other data sets, etc

Page 56: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Limitations of short read technology •  Need a genome

•  De-novo assembly difficult

•  Can’t sequence through repeats •  80% of the human genome is “sequenceable”

•  Need high coverage 15-20X to detect polymorphisms •  Missed SNPs are likely due to low coverage •  300X for 1 in 20 event (1 heterozygous in 10 samples)

•  Error rate increases past the first 30~50 bases

Page 57: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Paired End Reads are Important!

Repetitive DNA Unique DNA

Single read maps to multiple positions

Paired read maps uniquely

Read 1 Read 2

Known Distance

Page 58: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Paired Ends are Important Part 2

Shendure et al 2005

Deletion Insertion Inversion

Page 59: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Paired end mapping reveal structural variations High-throughput

paired-end mapping (PEM)

22

one read maps while the other one does not. Such pairs form a ‘hang-ing insertion’ signature5 (Fig. 1i). De novo assembly of such hanging reads can be used to reconstruct a small inserted segment, although if it is substantially larger than the insert size, hanging reads will not cover the entire insertion.

Signatures based on depth of coverageThe high coverage of NGS makes it possible to identify a completely different type of signature, based on the depth of coverage (DOC). Assuming the sequencing process is uniform, the number of reads mapping to a region follows a Poisson distribution and is expected to be proportional to the number of times the region appears in the donor. Thus, a region that has been deleted (duplicated) will have less (more) reads mapping to it. Although earlier work used DOC to identify recent segmental duplications in the human genome37 and compare segmental duplications between human and chimp38, Campbell et al.34 were the first to use these ‘gain/loss’ signatures to detect CNVs between tumor and healthy samples of the same individuals (Fig. 2). Unlike the PEM insertion signatures, the gain signature does not indicate where an insertion occurred, but rather

which is composed of two linking signatures where the linked regions are close to each other (Fig. 1e). Unlike the basic insertion, the linked insertion signature can be used to identify the region that has been insert-ed. However, if the size of the insertion is large, then the confidence that the two link-ing signatures are associated with the same insertion decreases, and thus this signature becomes weak for very large insertions.

Another type of linking signature is creat-ed by a region of the reference that has been tandemly duplicated in the donor. Cooper et al.7 first observed that a mate pair that has an end in each of the two copies will have an ‘everted’ mapping: the order of the mates is reversed while the orientation stays the same (Fig. 1f). We call this an ‘everted duplication’ signature. This signature can only be used to detect a novel tandem duplication—for example, it cannot detect a tandemly repeat-ed region whose copy count changes from two to three.

All of the methods outlined above, although able to identify approximate loca-tions of breakpoints, cannot indicate the exact locations. The methods below describe signatures that address this shortcoming.

Breakpoint identification: split mapping and hanging insertion. A read sampled across a deletion breakpoint will leave a ‘split mapping’ signature in the reference, with a prefix and suffix of the read map-ping to different locations. Whereas this signature is detectable with longer reads5,35, there are too many such spurious mappings of short read halves, and hence too many spurious signatures, with short read data. Nevertheless, Ye et al.36 showed that if one uses the fact that the mate of a split read must map nearby, then the search space for the split mapping of the hanging read can be much reduced. Thus we have the ‘anchored split mapping’ signature, in which one of the mates maps to the reference and the other has a split mapping with one of its parts about 1 insert size away (Fig. 1g). A similar situation occurs when there is an insertion of a few base pairs. This will leave behind a similar signature, except that the split read will have a prefix and suf-fix mapping to adjacent locations, and there will be a middle part of the read (the bases inserted) that will not be part of either the prefix or suffix mapping (Fig. 1h).

The anchored split mapping signature has the advantage that it can pinpoint the breakpoint of the event with base-pair precision. However, if the deletion is too large, then there will be too many spurious hits for the farther part of the split mapping. Similarly, the size of the insertion detectable with this signature is only a few base pairs, as every inserted base reduces the fraction of the read that matches the genome.

To identify insertions that contain a novel genomic segment, it is possible to use mate pairs spanning either of the breakpoints, where

Basic insertion Basic inversionBasic deletion

Linking Everted duplicationLinked insertion

Anchored split mapping(deletion)

Hanging insertionAnchored split mapping(insertion)

Donor

Ref

Donor

Ref

Donor

Ref

A

A

B

B

A

A

B

C

C

B

0

a cb

d fe

g ih

Figure 1 | Illustrations of PEM signatures. Mate pairs are sampled from the donor, where they are ordered with opposite orientation (the blue mate follows the orange), and are mapped to the reference (ref). Basic signatures include (a) insertions and (b) deletions, where the mapped distance is different from the insert size, as well as (c) inversions, where the order of the two mates is preserved but one of them changes orientation. (d) The linking signature has several discordant mate pairs with similar mapped distances identifying adjacency in the donor (dashed orange arrows) of two distal segments of the reference. The orientation and order of the mapped mate pairs depends on the orientation and order of the two segments in the reference; here, these are unchanged. (e) A linked insertion signature is composed of two linking signatures and arises when the inserted sequence (green) is copied from another location in the genome. (f) A tandem duplication will create an everted duplication linking signature, with mates mapping out of order but with proper orientations. These mate pairs link the end of the duplicated region to its beginning. (g,h) In the anchored split mapping signature, one mate has a good mapping, whereas the other has a split mapping. For a deletion (g) the prefix and suffix surround the deletion, whereas for an insertion (h) the split read has the prefix and suffix mapped to adjacent locations, while a middle part does not map. (i) When a novel genomic segment is inserted, a hanging insertion signature is created, in which only one of the mates has a good mapping.

NATURE METHODS SUPPLEMENT | VOL.6 NO.11s | NOVEMBER 2009 | S15

REVIEW

Medvedev et al. Nature Methods 2009

Page 60: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

We need more genomes!

•  Complete genomics ($5000)

•  ABI ($10,000)

•  Illumina ($10,000)

•  Intelligent Biosystems (<$1000)

Page 61: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

“3rd generation” sequencing

•  Ion torrent

•  Pac Bio

•  Nanopore

Page 62: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

JM Rothberg et al. Nature 475, 348-352 (2011) doi:10.1038/nature10242

Sensor, well and chip architecture.

Ion Torrent

Wafer, die and chip packaging.

Page 63: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Pros and Cons

•  Fast (4 hour sequencing)

•  Cheap per run, but not per base*

•  Homopolymers?

* Yet

Page 64: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Single-molecule, real-time (SMRT) sequencing PacBio

Page 65: Modern Epigenomics Histone Code - remc.wustl.eduremc.wustl.edu/dragonStar/DS2012_Lecture4.pdf · Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome

Nanopore sequencing