Upload
barrioam
View
385
Download
3
Embed Size (px)
Citation preview
EpigenomicsRoadmapwheretheroadhasled
!Uppsala,20160208
lvaroMartnezBarrio,[email protected]/in/ambarrio@ambarrio
Today
10:15-11:00EpigenomicsRoadmap1stpart(C8:321)11:00-11:15Legstretcher11:15-12:00EpigenomicsRoadmap2ndpart(C8:321)12:00-13:15Lunch13:15-14:45ComputerlabEpigenomicsRoadmap(A6:001)14:45-15:00WrapupEpigenomicsRoadmap15:15-16:00SciLIfeLab/TheSvedberg(A1:111a);HumanandgreatapesgenomediversityandevolutionofDNAmethylationbyTomasMarques-Bonet,Barcelona
Aboutme
lvaroMartnezBarrio,[email protected]/in/ambarrio@ambarrio
PhDBioinformatics2010 PostdocPopGenetics/CompBiol2014,L.Andersson+H.Ronne
BioinformaticsScientist@scilifelab
Objectives
Understandtheimportanceofaninternationalresearcheffortinepigenomics
KnowthehighlightsoftheEpigenomicsRoadmapproject
Extractsomepracticalcaseswheretheroadmapmayhelpmedicalresearch
GetpracticalexperienceusingtheEpigenomicsRoadmapdata
Veryshortknowledgereview
epigenetic |pdntk| adjective 1 Biology resulting from external rather than genetic influences: epigenetic carcinogens. relating to or of the nature of epigenesis. 2 Geology formed later than the surrounding or underlying rock formation. DERIVATIVES epigenetically adverb, epigenetics pluralnoun
Waddington's "Epigenetic Landscape during differentiation
(C.H Waddington, 1957)
LevelingWaddington:theemergenceofcellreprogrammingandtransdifferentiation
LadewigJ,KochP&BrstleO(2013)NatureReviewsMolecularCellBiology14,225-236
After Waddington's "Epigenetic Landscape" (C.H Waddington, 1957)
(C.H Waddington, 1957)
Epigenomeaffectsgeneexpression
CpG
High Medium Low Ball M et al. Nature BT 27:361, 2009
Epigenetics
Upon/on/above/over/aroundthegenome Externalorenvironmentalfactorscanswitchon/offthecellularmachinery
With(out)underlyingchangesinDNAsequence
Changesintheregulationofgeneexpressionthatcanbepassedontoacellsprogeny
Changescanbereversed
Studiesin648twins,97MZ,162DZand130singletons.
GenotypingandHumanMeth450Chip. VariationinDNAmethylationhighlyheritable(37%)
Commonenvironmentexplain2%ofvariation Remainingvariationduetonon-sharedenvironmentandstochasticfactors.
Grundberg E et al (2013) AJHG 93:876
NaturevsNurture(inDNAmethylation)
ByNIH,PublicDomain
GG13CH03-Hawkins ARI 25 July 2012 11:40
Odd-numberednucleosome
Even-numberednucleosome
Plane ofnucleosome layers
DNA
Protein scaffold
Chromatin loop
Metaphasechromosome
1
23
4
5135
241
23 5
f Organization of wholechromosomes inside thenucleus (quaternary level)
d Loops of 30-nmfiber (tertiary level)
e Interdigitating layers ofirregularly organizednucleosomes (tertiary level)
a 11-nm fiber(primary level)
b Nucleosome stacking(folded 11-nm fiber withzigzag linker DNA)
c 30-nm fiber(secondary level)
Nucleus
Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form the 11-nm fiber that is the primary level ofchromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nm fiber folds on itself to formtwo stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.(c) The folded 11-nm fiber forms a two-start helix to produce the 30-nm chromatin fiber that is the secondary level of compaction.(d ) The 30-nm fiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to aprotein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nm fiber may also result in the formation ofinterdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do containnucleosome fibers, but it is unclear whether they are 30-nm fibers or another type. Regardless, this is another tertiary level ofcompaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and theirrelationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planesof nucleosome layers as viewed from above.
Quaternary structureof chromatin:the 3D positioning ofchromatin domainsrelative to one anotherand to the nuclearlamina inside thenucleus
in metaphase chromosomes (8, 9, 26)(Figure 1e). These, too, are considered to rep-resent the tertiary level of chromatin packaging.
The quaternary structure of chromatinrefers to the actual positioning of the chro-mosomes with respect to one another in thenucleus and with respect to the lamina of the
inner nuclear membrane (Figure 1f ). It isknown that expression of a gene is affectedby its three-dimensional (3D) position withinthe nucleus, with the general consensus beingthat transcriptionally active genomic regionsare further away from the nuclear peripherythan those that are silent (80). The former
www.annualreviews.org Higher-Order Chromatin Structure 61
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
2.13
:59-
82. D
ownl
oade
d fr
om w
ww
.ann
ualre
view
s.org
Acc
ess p
rovi
ded
by U
nive
rsity
of U
ppsa
la o
n 11
/26/
15. F
or p
erso
nal u
se o
nly.
GG13CH03-Hawkins ARI 25 July 2012 11:40
Odd-numberednucleosome
Even-numberednucleosome
Plane ofnucleosome layers
DNA
Protein scaffold
Chromatin loop
Metaphasechromosome
1
23
4
5135
241
23 5
f Organization of wholechromosomes inside thenucleus (quaternary level)
d Loops of 30-nmfiber (tertiary level)
e Interdigitating layers ofirregularly organizednucleosomes (tertiary level)
a 11-nm fiber(primary level)
b Nucleosome stacking(folded 11-nm fiber withzigzag linker DNA)
c 30-nm fiber(secondary level)
Nucleus
Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form the 11-nm fiber that is the primary level ofchromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nm fiber folds on itself to formtwo stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.(c) The folded 11-nm fiber forms a two-start helix to produce the 30-nm chromatin fiber that is the secondary level of compaction.(d ) The 30-nm fiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to aprotein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nm fiber may also result in the formation ofinterdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do containnucleosome fibers, but it is unclear whether they are 30-nm fibers or another type. Regardless, this is another tertiary level ofcompaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and theirrelationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planesof nucleosome layers as viewed from above.
Quaternary structureof chromatin:the 3D positioning ofchromatin domainsrelative to one anotherand to the nuclearlamina inside thenucleus
in metaphase chromosomes (8, 9, 26)(Figure 1e). These, too, are considered to rep-resent the tertiary level of chromatin packaging.
The quaternary structure of chromatinrefers to the actual positioning of the chro-mosomes with respect to one another in thenucleus and with respect to the lamina of the
inner nuclear membrane (Figure 1f ). It isknown that expression of a gene is affectedby its three-dimensional (3D) position withinthe nucleus, with the general consensus beingthat transcriptionally active genomic regionsare further away from the nuclear peripherythan those that are silent (80). The former
www.annualreviews.org Higher-Order Chromatin Structure 61
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
2.13
:59-
82. D
ownl
oade
d fr
om w
ww
.ann
ualre
view
s.org
Acc
ess p
rovi
ded
by U
nive
rsity
of U
ppsa
la o
n 11
/26/
15. F
or p
erso
nal u
se o
nly.
SajanS.AandHawkinsR.D.Annu.Rev.GenomicsHum.Genet.2012
GG13C
H03-Ha
wkins
ARI25 J
uly201
211:4
0
Odd-num
bered
nucleos
ome
Even-nu
mbered
nucleos
ome
Plane of
nucleos
ome laye
rs
DNA
Protein
scaffold
Chroma
tin loop
Metaph
ase
chromos
ome
1
2 3
45
13
5
24
1
2 3
5
f Organiz
ation of
whole
chromos
omes ins
ide the
nucleus
(quater
nary lev
el)
d Loops
of 30-nm
fiber (te
rtiary lev
el)
e Interdi
gitating
layers o
f
irregular
ly organ
ized
nucleos
omes (te
rtiary lev
el)
a 11-nm
fiber
(primary
level)
b Nucleo
some st
acking
(folded 1
1-nm fibe
r with
zigzag l
inker DN
A)
c 30-nm
fiber
(second
ary leve
l)
Nucleus
Figure 1
Differen
t levels o
f chrom
atincom
paction.
(a) Multip
le nucle
osomes in
a row fo
rm the 1
1-nmfibe
r that is
theprim
aryleve
l of
chromat
in comp
action.
Alternat
ingnuc
leosome
s aredep
icted w
ith blue
andgree
n surfac
es. (b) T
he 11-nm
fiber fo
lds on its
elf to fo
rm
twostac
ks/colum
ns of nu
cleosom
es such t
hatodd
-numbere
d nucleo
somes in
teract w
ith other
odd-num
bered nu
cleosom
es and e
ven-
numbere
d nucleo
somes in
teract w
ith other
even-num
bered nu
cleosom
es. The l
inker DN
A zigzag
s betwe
en the t
wonuc
leosome
stacks.
(c) The f
olded 11
-nmfibe
r forms
a two-st
art helix
to produ
ce the 3
0-nmchro
matin fi
berthat
is the se
condary
level of
compact
ion.
(d )The
30-nm fi
bertwis
ts furthe
r and fo
rmsa m
orecom
pactfibe
r that is
arrange
d inloop
s (blue),
withsom
e portio
ns attac
hedto a
protein
scaffold
(red). T
his is on
e ofthe
tertiary
levels of
compact
ion.(e) T
he 30-nm
fiber ma
y also re
sultin th
e forma
tionof
interdig
itating l
ayers of
irregula
rly orien
tednuc
leosome
s, particu
larlyin m
etaphase
chromos
omes. N
otethat
these pl
atesdo c
ontain
nucleos
omefibe
rs, but it
is unclea
r whethe
r they a
re 30-nm
fibers o
r anothe
r type. R
egardles
s, this is
another
tertiary
level of
compact
ion.( f )
Thequa
ternary
level ref
ers to th
e three-
dimensi
onal org
anizatio
n ofenti
re chrom
osomes in
sidethe
nucleus
andthei
r
relation
ships wi
th one a
nother a
s well as
withthe
inner nu
clear me
mbrane.
Theblac
k lines o
n the pi
nk chrom
osome re
present
planes
of nucle
osome lay
ers as vie
wedfrom
above.
Quatern
arystru
cture
of chro
matin:
the3D
position
ingof
chromat
in doma
ins
relative
to one a
nother
andto th
e nuclea
r
lamina
inside th
e
nucleus
inmet
aphase
chromos
omes (
8,9,
26)
(Figure
1e).The
se, too,
arecon
sidered
to rep-
resent th
e tertiary
level of c
hromatin
packagin
g.
Thequa
ternary
structur
e of c
hromatin
refers t
o the ac
tualposi
tioning
of the
chro-
mosome
s with
respect
to one a
nother i
n the
nucleus
andwith
respect
to the l
amina o
f the
inner n
uclear m
embrane
(Figure
1f ).It i
s
known
thatexpr
ession o
f agen
e isaffe
cted
by its th
ree-dim
ensiona
l (3D) po
sition w
ithin
thenuc
leus, wi
th the g
eneral c
onsensus
being
thattran
scription
allyacti
ve geno
micregi
ons
arefurt
herawa
y from
thenuc
learperi
phery
thanthos
e that a
re silen
t (80).
Theform
er
www.ann
ualrevie
ws.org
Highe
r-Order
Chroma
tin Struc
ture61
Annu
. Rev
. Gen
om. H
uman
Gen
et. 2
012.
13:5
9-82
. Dow
nloa
ded
from
www
.annu
alrev
iews.o
rg
Acc
ess p
rovi
ded
by U
nive
rsity
of U
ppsa
la on
11/
26/1
5. F
or p
erso
nal u
se o
nly.
DamIDA method for mapping the distribution of chromatin-associated proteins by fusing a protein of interest with E. coli DNA adenine methyltransferase (Dam), which methylates adenines proximal to the binding sites of a protein, thus circumventing the need for antibodies.
Giemsa bandAlso known as a G-band. A characteristic banding pattern is obtained by treating chromosomes with Giemsa stain. The intensity of Giemsa staining is correlated with genomic features. For instance, dark Giemsa bands usually are AT rich, have low gene density and have higher densities of repeat elements.
Polycomb bodyA discrete nuclear focus containing Polycomb proteins and their silenced target genes. Polycomb bodies have been observed in D. melanogaster and human cells by imaging and in situ hybridization.
H3K9me2 and lamina-associated domains. The nuclear lamina is thought to bind and silence large regions of heterochromatin. Two studies that analysed distinct genomic features identified similar sets of domains enriched for H3K9 methylation and lamina contact96,97. Guelen et al. globally mapped the interaction between the genome and nuclear lamina in human fibroblasts using DamID. These authors observed two discrete chromatin environments: lamina-associated domains (LADs) and regions outside LADs. Both regions were approximately 0.110 Mb in size. LADs were found to have low gene density, low transcriptional activity and a paucity of active chromatin modifications. Although the nuclear lamina had previously been associated with inactivity, for the first time, these studies defined the locations and extents of LADs and the correlated chromatin patterns. Remarkably, tethering experi-ments show that interaction with the nuclear lamina is not only correlative but is also causal in reducing gene expression98100.
Wen et al. identified a similar set of genomic domains by analysing genome-wide maps of H3K9me2 in differ-entiated and undifferentiated cells97. They found large and diffuse regions of K9 methylation that cover up to 4.9 Mb and collectively represent up to 46% of the genome, which they termed large organized chroma-tin K modifications (LOCKs). These investigators also showed that LOCKs are conserved between human and mouse, and that the H3K9me2 mark was dependent on the G9A H3K9 methyltransferase. Furthermore, a close relationship between LOCKs and LADs was indicated by a striking overlap of 82% between placental LOCKs and LADs found in fibroblasts. Thus, genomic regions diffusely marked by H3K9 methylation seem to be in contact with the nuclear lamina; these findings have prompted a model in which chromatin is partitioned into distinct environments in different cell types. It was initially proposed that LOCKs are relatively scarce in ES cells, as few such chromatin domains could be detected. However, whether this reflects a true distinc-tion in modification patterns between cell types or a detection bias has been questioned101. The nature of these compartments remains an area of active inves-tigation, as these structures could play a crucial part in sequestering unused regions of the genome, and thereby reducing the effective search space for gene regulatory machinery.
H3K27me3 blocks and Polycomb bodies. Genome-wide histone modification maps have also revealed large blocks of H3K27me3 in differentiated cells. Identification of these domains relied on new algo-rithms for identifying broad regions rather than sharp peaks of enrichment, as two recent studies illustrate. Pauler et al. used an algorithm called broad local enrichments (BLOCs) to identify H3K27me3 blocks that are on average 43 kb and overlap silent genes and intergenic regions102. They found this pattern in numerous ChIPchip and ChIPseq data sets, and sug-gest that this is a common feature of H3K27me3 in dif-ferentiated cell types. The authors speculate that these
H3K27me3 blocks may relate to Giemsa bands, as they observe alternating chromatin patterns along chromo-somes. Hawkins et al. used ChromaBlocks to find simi-lar H3K27me3 blocks in human IMR90 fibroblasts and characterized their dynamics during differentiation74. This study suggested that these repressive domains are often seeded in ES cells and expand in differentiated cell types, apparently to confer cell type-specific repres-sion (FIG. 4d). As these domains have only recently been observed, little is known about their establishment or functional consequences. It is tempting to consider the possibility that, like H3K9me2 domains, H3K27me3 blocks mark distinct nuclear structures or regions. They potentially correspond to Polycomb bodies, which are discrete foci of silenced genes that have been observed by imaging and in situ hybridization in fly and human cells103. Although there are no data yet that directly link H3K27me3 blocks to these structures, there is indirect evidence of H3K27me association with compacted chromatin; H3K27me3 can promote recruitment of PRC1 (REF. 6), and PRC1 may be required
Figure 5 | Histone modification signatures associated with features in the mammalian cell nucleus. Signature histone modifications correlate with various nuclear features, although the relationships might be indirect. Chromatin with modifications generally associated with active transcription (green dots) often replicates early, whereas chromatin with generally repressive modifications (purple dots) replicates late. Regions enriched for some sets of active modifications (blue dots) may converge into transcription factories (TRFs). Blocks of histone H3 lysine 27 trimethylation (H3K27me3; red dots) may form Polycomb bodies (Pc) and diffuse domains marked by H3K9me2 or H3K9me3 (purple dots) may contact the nuclear lamina.
REVIEWS
NATURE REVIEWS | GENETICS VOLUME 12 | JANUARY 2011 | 15
2011 Macmillan Publishers Limited. All rights reserved
GG13CH03-HawkinsARI
25 July 2012
11:40
Odd-numbered
nucleosomeEven-numbered
nucleosome
Plane ofnucleosome layers
DNAProtein scaffold
Chromatin loopMetaphase
chromosome
1 2
3 4
5
13
5
24
1 2
3
5
f Organization of whole
chromosomes inside the
nucleus (quaternary level)
d Loops of 30-nm
fiber (tertiary level)
e Interdigitating layers of
irregularly organized
nucleosomes (tertiary level)
a 11-nm fiber
(primary level)
b Nucleosome stacking
(folded 11-nm fiber with
zigzag linker DNA)
c 30-nm fiber(secondary level)
Nucleus
Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form
the 11-nmfiber that is the primary level of
chromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nmfiber folds on itself to form
two stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-
numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.
(c) The folded 11-nmfiber forms a two-start helix to produce the 30-nm
chromatin fiber that is the secondary level of compaction.
(d ) The 30-nmfiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to a
protein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nmfiber may also result in the formation of
interdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do contain
nucleosome fibers, but it is unclear whether they are 30-nmfibers or another type. Regardless, this is another tertiary level of
compaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and their
relationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planes
of nucleosome layers as viewed fromabove.
Quaternary structure
of chromatin:
the 3Dpositioning of
chromatin domains
relative to one another
and to the nuclear
lamina inside the
nucleus
inmetaphase
chromosomes(8,
9,26)
(Figure 1e). These, too, are considered to rep-
resent the tertiary level of chromatin packaging.
Thequaternary
structureof
chromatin
refers tothe actual positioning
of the chro-
mosomes with respect to one another in the
nucleus and with respect to the lamina of the
inner nuclear membrane(Figure
1f ). It is
knownthat expression
of a gene is affected
by its three-dimensional (3D) position within
the nucleus, with the general consensus being
that transcriptionally active genomic regions
are further away fromthe nuclear periphery
thanthose that are silent (80). The former
www.annualreviews.org Higher-Order Chromatin Structure
61
Annu
. Rev
. Gen
om. H
uman
Gen
et. 20
12.13
:59-82
. Dow
nload
ed fr
om w
ww.an
nualr
eview
s.org
Acce
ss pro
vided
by U
nivers
ity of
Upp
sala o
n 11/2
6/15.
For p
erson
al use
only.
CpG islandA genomic region enriched for CpG dinucleotides that often occurs near constitutively active promoters. Mammalian genomes are otherwise depleted of CpGs owing to the preferential deamination of methylated cytosines.
developments that have punctuated the shift from a gene-centric to genome-wide view. Then we discuss our current knowledge of primary chromatin structure, focusing on the global patterns, functions and dynamics of histone modifications that overlay sequence features such as promoters, enhancers and gene bodies. Finally, we will discuss notable recent studies that illuminate the link between histone modifications and higher-order chromatin domains.
From gene-centric to genome-wideFor the past several decades, chromatin biology has been guided by a succession of methods for probing features such as chromatin accessibility; DNA methylation; the
location, composition and turnover of nucleosomes; and the patterns of post-translational histone modifica-tions. Technological advances in microarrays and next- generation sequencing have enabled many of these assays to be scaled genome-wide. Notable examples include: the DNase Iseq9,10, FAIREseq11 and Sonoseq12 assays for chromatin accessibility; whole-genome and reduced-representation bisulphite sequencing (BS-seq)13,14 and MeDIP-seq15 assays for DNA methylation; and the MNaseseq16,17 and CATCHIT18 assays for elucidating nucleosome position and turnover, respectively. These technologies and their integration have been extensively reviewed elsewhere19,20. In this section, we focus on his-tone modifications and, in particular, on how genome-wide ChIPseq-mapping studies have enhanced our understanding of the chromatin landscape.
Mapping histone modifications genome-wide. Although ChIP has been used since 1988 (REF. 21) to probe chro-matin structure at individual loci, its combination with microarrays and, more recently, next-generation sequenc-ing has provided far more precise and comprehensive views of histone modification landscapes, which have highlighted roles for chromatin structures across diverse genomic features and elements that were not appreci-ated in targeted studies. The basis of ChIP is the immu-noprecipitation step, in which an antibody is used to enrich chromatin that carries a histone modification (or other epitope) of interest. In ChIPseq, next-generation technology is used to deep sequence the immunoprecip-itated DNA molecules and thereby produce digital maps of ChIP enrichment (BOX 1). An example is the compre-hensive work by Keji Zhaos group to profile 39 different histone methylation and acetylation marks genome-wide in human CD4+ T cells22,23. These maps and similar data sets2426 have associated particular modifications with gene activation or repression and with various genomic features, including promoters, transcribed regions, enhancers and insulators (FIG. 2). These and subsequent studies highlight the value of comprehensive and less-biased sequencing approaches for testing the general-ity of insights gleaned through gene-specific studies, as well as for identifying altogether new associations and biological phenomena.
Integrating ChIPseq maps. The expanding body of chromatin data in the public domain has fostered many computational efforts that aim to integrate different data types, identify novel relationships among histone modi-fications and related chromatin structures, and develop new hypotheses regarding the regulatory functions of these chromatin features. Integration of histone modi-fication maps with chromatin accessibility, nucleosome positions, transcription factor binding, RNA expression and sequence-based genome annotations is providing increasingly unified views of chromatin structure and function17,19,27.
Two recent studies have presented innovative approaches for integrating genome-wide chromatin maps28,29, both of which were demonstrated on a com-pendium of ChIPseq data for human CD4+ T cells22,23.
Figure 1 | Layers of chromatin organization in the mammalian cell nucleus. Broadly, features at different levels of chromatin organization are generally associated with inactive (off) or active (on) transcription. From the top, genomic DNA is methylated (Me) on cytosine bases in specific contexts and is packaged into nucleosomes, which vary in histone composition and histone modifications (for example, histone H3 lysine 9 trimethylation (H3K9me3)); these features constitute the primary layer of chromatin structure. Here, different histone modifications are indicated by coloured dots and histone variants such as H2A.Z are brown. DNA in chromatin may remain accessible to DNA-binding proteins such as transcription factors (TFs) and RNA polymerase II (RNAPII) or may be further compacted. Chromatin can also organize into higher-order structures such as nuclear lamina-associated domains and transcription factories. Each layer of organization reflects aspects of gene and genome regulation.
REVIEWS
8 | JANUARY 2011 | VOLUME 12 www.nature.com/reviews/genetics
2011 Macmillan Publishers Limited. All rights reserved
OngC-tandCorcesV.G.NatureReviewGenetics2014
http://www.nature.com/epigenomeroadmap
http://www.nature.com/epigenomeroadmap
Abitofrecenthistory
2012,Naturepublishedmod/ENCODE(pilotlaunchedbyUSNHGRIin2003-2007)whichaimstodescribeallthefunctionalelementsencodedinthehumangenomebymappingepigeneticmodifications.Pioneereffortbutclinicallylimited.
2014,RoadmapEpigenomicsProject,USNIHinitiative. Material:stemcells,maturecellfromvarioustissuesfromhealthyanddiseasedonors(cancer,neurodegenerative,autoimmune,)
Epigenomechangesduringdisease
Previoustechniques:HumanMeth450kchip,MeDIP-Seq,MBDCap-Seq Currenttechniques:wholegenomebisulphatesequencing(WGB-Seq) Acausallinkbetweenepigeneticchangesanddiseasehassofarbeenhardtoestablish.
2014,RoadmapEpigenomicsProject(Nature):consistentalterationintheepigeneticlandscapecouldidentifycandidategenesandpathwaystofurtherfollow-up
2014,RoadmapEpigenomicsProject(Nature):time-coursestudiesoftheepigeneticsofcelltyperelevanttoaspecificdiseasecouldindicatewhetherepigeneticchangeshavearoleindiseaseprogression,oronlyinitsonset
Epigenomicmapsshouldhelptonavigatepoorlyunderstoodregionsofthegenome
Epigenomeincancer
Cancerthediseaseofthegenome BUTlinkedmostunambiguouslytoepigeneticaberrations
Epigenomicorganizationaffectsthegenomiclocationofthemutationsthatprovokecancer
Epigenomeofacancercellcarriesafingerprintofthecelltypethatoriginatedthecancer
8tracks!21NaturePublishingGrouparticles!58additionalresearcharticleswithNIHRoadmapEpigenomicsfunding
http://www.nature.com/epigenomeroadmap
http://www.nature.com/epigenomeroadmap
http://www.nature.com/epigenomeroadmap
http://www.nature.com/epigenomeroadmap
WhataboutEurope?
http://www.epigenome.org/index.phphttps://en.wikipedia.org/wiki/Human_Epigenome_Project
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0000082
http://www.epigenome.org/index.phphttps://en.wikipedia.org/wiki/Human_Epigenome_Projecthttp://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0000082
WhataboutEurope?
23
http://www.nature.com/news/2011/110928/full/477518a.html
http://www.blueprint-epigenome.eu/
http://www.nature.com/news/2011/110928/full/477518a.htmlhttp://www.blueprint-epigenome.eu/
24
http://ihec-epigenomes.org/
http://ihec-epigenomes.org/outcomes/datasets/
http://ihec-epigenomes.org/http://ihec-epigenomes.org/outcomes/datasets/
ARTICLE OPENdoi:10.1038/nature14248
Integrative analysis of 111 referencehuman epigenomesRoadmap Epigenomics Consortium{, Anshul Kundaje1,2,3*, Wouter Meuleman1,2*, Jason Ernst1,2,4*, Misha Bilenky5*,Angela Yen1,2, Alireza Heravi-Moussavi5, Pouya Kheradpour1,2, Zhizhuo Zhang1,2, Jianrong Wang1,2, Michael J. Ziller2,6,Viren Amin7, John W. Whitaker8, Matthew D. Schultz9, Lucas D. Ward1,2, Abhishek Sarkar1,2, Gerald Quon1,2,Richard S. Sandstrom10, Matthew L. Eaton1,2, Yi-Chieh Wu1,2, Andreas R. Pfenning1,2, Xinchen Wang1,2,11, Melina Claussnitzer1,2,Yaping Liu1,2, Cristian Coarfa7, R. Alan Harris7, Noam Shoresh2, Charles B. Epstein2, Elizabeta Gjoneska2,12, Danny Leung8,13,Wei Xie8,13, R. David Hawkins8,13, Ryan Lister9, Chibo Hong14, Philippe Gascard15, Andrew J. Mungall5, Richard Moore5,Eric Chuah5, Angela Tam5, Theresa K. Canfield10, R. Scott Hansen16, Rajinder Kaul16, Peter J. Sabo10, Mukul S. Bansal1,2,17,Annaick Carles18, Jesse R. Dixon8,13, Kai-How Farh2, Soheil Feizi1,2, Rosa Karlic19, Ah-Ram Kim1,2, Ashwinikumar Kulkarni20,Daofeng Li21, Rebecca Lowdon21, GiNell Elliott21, Tim R. Mercer22, Shane J. Neph10, Vitor Onuchic7, Paz Polak2,23,Nisha Rajagopal8,13, Pradipta Ray20, Richard C. Sallari1,2, Kyle T. Siebenthall10, Nicholas A. Sinnott-Armstrong1,2,Michael Stevens21,42, Robert E. Thurman10, Jie Wu24,25, Bo Zhang21, Xin Zhou21, Arthur E. Beaudet26, Laurie A. Boyer11,Philip L. De Jager2,23,27, Peggy J. Farnham28, Susan J. Fisher29, David Haussler30, Steven J. M. Jones5,31,32, Wei Li33,Marco A. Marra5,32, Michael T. McManus34, Shamil Sunyaev2,23,27, James A. Thomson35,41, Thea D. Tlsty15, Li-Huei Tsai2,12,Wei Wang8, Robert A. Waterland36, Michael Q. Zhang20,37, Lisa H. Chadwick38, Bradley E. Bernstein2,39,401,Joseph F. Costello141, Joseph R. Ecker91, Martin Hirst5,181, Alexander Meissner2,61, Aleksandar Milosavljevic71, Bing Ren8,131,John A. Stamatoyannopoulos101, Ting Wang211 & Manolis Kellis1,21
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease,but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generatedthe largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysisof 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNAaccessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatorymodules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associatedgenetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diversehuman traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstratethe central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
While the primary sequence of the human gen-ome is largely preserved in all human cell types,the epigenomic landscape of each cell can varyconsiderably, contributing to distinct gene expres-sion programs and biological functions14. Epi-genomic information, such as covalent histone modifications, DNAaccessibility and DNA methylation can be interrogated in each cell andtissue type using high-throughput molecular assays2,58. The resultingmaps have been instrumental for annotating cis-regulatory elementsand other non-exonic genomic features with characteristic epigenomicsignatures9,10, and for dissecting gene regulatory programs in develop-ment and disease7,9,1114. Despite these technological advances, we stilllack a systematic understanding of how the epigenomic landscape con-tributes to cellular circuitry, lineage specification, and the onset and pro-gression of human disease.
To facilitate and spearhead these efforts, the NIH Roadmap Epigeno-mics Program was established with the goal of elucidating how epige-netic processes contribute to human biology and disease. One of themajor components of this programme consists of the Reference Epi-genome Mapping Centers (REMCs)15, which systematically character-ized the epigenomic landscapes of representative primary human tissues
and cells. We used a diversity of assays, includingchromatin immunoprecipitation (ChIP)9,10,16,17,DNA digestion by DNase I (DNase)7,18, bisulfitetreatment1,2,19,20, methylated DNA immunopreci-pitation (MeDIP)21, methylation-sensitive restric-
tion enzyme digestion (MRE)22, and RNA profiling8, each followed bymassively parallel short-read sequencing (-seq). The resulting data setswere assembled into publicly accessible websites and databases, whichserve as a broadly useful resource for the scientific and biomedical com-munity. Here we report the integrative analysis of 111 reference epige-nomes (Fig. 1 and Extended Data Fig. 1ad), which we analyse jointlywith an additional 16 epigenomes previously reported by the Ency-clopedia of DNA Elements (ENCODE) project9,23.
We integrate information about histone marks, DNA methylation,DNA accessibility and RNA expression to infer high-resolution mapsof regulatory elements annotated jointly across a total of 127 referenceepigenomes spanning diverse cell and tissue types. We use these anno-tations to recognize epigenome differences that arise during lineagespecification and cellular differentiation, to recognize modules of regu-latory regions with coordinated activity across cell types, and to identifykey regulators of these modules based on motif enrichments and regulator
A special issuenature.com/epigenomeroadmap
NatureEPIGENOME ROADMAP
{Lists of participants and their affiliations appear at the end of the paper.*These authors contributed equally to this work.1These authors jointly supervised this work.
A list of affiliations appears at the end of the paper.
1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 7
Macmillan Publishers Limited. All rights reserved2015
4 Reference Epigenome Mapping Centers (REMCs)
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
26RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
http://www.roadmapepigenomics.org/participants
http://www.roadmapepigenomics.org/participants
4 Reference Epigenome Mapping Centers (REMCs) 111 reference human epigenomes from primary cells and tissues
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
We computed several quality control measures (Fig. 2 and Supplemen-tary Table 1) including the number of distinct uniquely mapped reads;the fraction of mapped reads overlapping areas of enrichment18,36;
genome-wide strand cross-correlation37 (Fig. 2eg); inter-replicatecorrelation; multidimensional scaling of data sets from different pro-duction centres (Supplementary Fig. 1); correlation across pairs of datasets (Extended Data Fig. 1e); consistency between assays carried out inmultiple mapping centres (Supplementary Table 2); read mapping qua-lity for bisulfite-treated reads38,39; and agreement with imputed data40.Outlier data sets were flagged, removed or replaced, and lower-coveragedata sets were combined where possible (see Methods).
The resulting data sets provide global views of the epigenomic land-scape in a wide range of human cell and tissue types (Fig. 3), includingthe largest and most diverse collection to date of chromatin state anno-tations (Fig. 3a); some of the deepest surveys of individual cell typesusing diverse epigenomic assays (with 2131 distinct epigenomic marksfor seven deeply profiled epigenomes; Fig. 3b); and some of the broad-est surveys of individual epigenomic marks across multiple cell types(Fig. 3c). These data sets enable genome-wide epigenomic analyses acrossmultiple dimensions (Fig. 3d). All data sets, standards and protocolsare publicly available from web portals, linked from the main consor-tium homepage http://www.roadmapepigenomics.org, and also at http://compbio.mit.edu/roadmap.
Chromatin states, DNA methylation and DNA accessibilityAs a foundation for integrative analysis, we used a common set of com-binatorial chromatin states41 across all 111 epigenomes, plus 16 addi-tional epigenomes generated by the ENCODE project (127 epigenomesin total), using the core set of five histone modification marks that werecommon to all. We trained a 15-state model (Fig. 4a, b and Supplemen-tary Table 3a) consisting of 8 active states and 7 repressed states (Fig. 4c)that were recurrently recovered (Extended Data Fig. 2a), and showeddistinct levels of DNA methylation (Fig. 4d), DNA accessibility (Fig. 4e),regulator binding (Extended Data Fig. 2b and Supplementary Fig. 2)and evolutionary conservation (Fig. 4f and Supplementary Fig. 3). Theactive states (associated with expressed genes) consist of active tran-scription start site (TSS) proximal promoter states (TssA, TssAFlnk), atranscribed state at the 59 and 39 end of genes showing both promoterand enhancer signatures (TxFlnk), actively transcribed states (Tx, TxWk),enhancer states (Enh, EnhG), and a state associated with zinc finger proteingenes (ZNF/Rpts). The inactive states consist of constitutive hetero-chromatin (Het), bivalent regulatory states (TssBiv, BivFlnk, EnhBiv),repressed Polycomb states (ReprPC, ReprPCWk), and a quiescent state(Quies), which covered on average 68% of each reference epigenome.Enhancer and promoter states covered approximately 5% of each referenceepigenome on average, and showed enrichment for evolutionarily con-served non-exonic regions42.
To capture the greater complexity afforded by additional marks, wetrained additional chromatin state models in subsets of cell types. Inthe subset of 98 reference epigenomes that also included H3K27ac data,we also learned an 18-state model (Extended Data Fig. 2c and Supplemen-tary Table 3b), enabling us to distinguish enhancer states containingstrong H3K27ac signal (EnhA1, EnhA2), which showed higher DNA
ES cell derivedPrimary cellPrim. tissue
Prim. culture
Cell line
IMR90
Myosat.
Adipose
Epithelial
Mesench.
B cell
T cell
ES-deriv.
iPSC
ES cell
Neurosph.Thymus
Other
Digestive
Heart
Smoothmuscle
Muscle
Brain
Data set count
EID Epigenome name
Cell type/tissuegroup
Blood &
HSC &
IMR90 fetal lung fibroblastsES-WA7 cellsH9 cellsES-I3 cellsHUES6 cellsHUES48 cellsHUES64 cellsH1 cellsES-UCSF4 cellsiPS-20b cellsiPS-18 cellsiPS-15b cellsiPS DF 6.9 cellsiPS DF 19.11 cellsH1 derived neuronal progenitor cultured cellsH9 derived neuronal progenitor cultured cellsH9 derived neuron cultured cellsHUES64 derived CD56+ mesodermHUES64 derived CD56+ ectodermHUES64 derived CD184+ endodermH1 BMP4 derived mesendodermH1 BMP4 derived trophoblastH1 derived mesenchymal stem cellsPrimary mononuclear cells (from PB)Primary T cells from primary blood (from PB)Primary T cells effector/memory enriched (PB)Primary T cells from cord bloodPrimary T regulatory cells (from PB)Primary T helper cells (from PB)Primary T helper naive cells (from PB)Primary T helper cells PMA-I stimulatedPrimary T helper 17 cells PMA-I stimulatedPrimary T helper memory cells (from PB)Primary T helper memory cells (from PB)Primary T CD8+ memory cells (from PB)Primary T helper naive cells (from PB)Primary T CD8+ naive cells (from PB)Primary monocytes (from PB)Primary B cells from cord bloodPrimary haematopoietic stem cells (HSCs)Primary HSCs G-CSF-mobilized malePrimary HSCs G-CSF-mobilized femalePrimary HSCs short term culturePrimary B cells (from PB)Primary natural killer cells (from PB)Primary neutrophils (from PB)Bone marrow derived MSCsMesenchymal stem cell deriv. chondrocyteAdipose-derived mesenchymal stem cellsMesenchymal stem cell derived adipocyteMuscle satelliteForeskin fibroblastForeskin fibroblastForeskin melanocyteForeskin melanocyteForeskin keratinocyteForeskin keratinocyteBreast vHMEC mammary epithelialBreast myoepithelialGanglion eminence derived neurospheresCortex derived neurospheresThymusFetal thymusBrain hippocampus middleBrain substantia nigraBrain anterior caudateBrain cingulate gyrusBrain inferior temporal lobeBrain angular gyrusBrain dorsolateral prefrontal cortexBrain germinal matrixFetal brain femaleFetal brain maleAdipose nucleiPsoas muscleSkeletal muscle femaleSkeletal muscle maleFetal muscle trunkFetal muscle legFetal heartRight atriumLeft ventricleRight ventricleAortaDuodenum smooth muscleColon smooth muscleRectal smooth muscleStomach smooth muscleFetal stomachFetal intestine smallFetal intestine largeSmall intestineSigmoid colonColonic mucosaRectal mucosa donor 29Rectal mucosa donor 31Stomach mucosaDuodenum mucosaOesophagusGastricPlacenta amnionFetal kidneyFetal lungOvaryPancreatic isletsFetal adrenal glandPlacentaLiverPancreasLungSpleenA549 EtOH 0.02pct lung carcinomaDnd41 T cell leukaemiaGM12878 lymphoblastoidHeLa-S3 cervical carcinomaHepG2 hepatocellular carcinomaHMEC mammary epithelialHSMM skeletal muscle myoblastsHSMM-derived skeletal muscle myotubesHUVEC umbilical vein endothelialK562 leukaemiaMonocytes-CD14+ RO01746NH-A astrocyteNHDF-ad adult dermal fibroblastNHEK-epidermal keratinocyteNHLF lung fibroblastOsteoblast
Prim
ary
cultu
res
ES c
ell d
eriv
edP
rimar
y ce
llsP
rimar
y tis
sues
E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129
127
127
127
127
127 98 62 53 95 78 184
127
H3K
4me1
H3K
4me3
H3K
36m
e3H
3K27
me3
H3K
9me3
H3K
27ac
H3K
9ac
DN
ase-
Seq
DN
A m
ethy
l
Add
tl m
arks
Chr
om. s
tate
s
Sam
ple
typ e
21
21
20
1311
1511
13
1
4444444455444544
ENCODE2012
Gen
e ex
pr.
Prim
ary
cultu
res
a b c e f g h i j kd
WGBS(n = 37)RRBS(n = 51)mCRF(n = 16)
Highest-quality epigenomes (n = 60)(ChromHMM model trained + applied)Remaining epigenomes (n = 67)(ChromHMM model only applied)
RNA-seq(n = 56)Microarray(n = 22)
50%
100%
0% Expr
essi
on
Qua
lity
Sig
nal-t
o-no
ise
ratio
per
cent
ile
Sam
ple
type
DN
A m
ethy
latio
n
Figure 2 | Data sets available for each reference epigenome. List of 127epigenomes including 111 by the Roadmap Epigenomics program (E001E113) and 16 by ENCODE (E114E129). See Supplementary Table 1 for a fulllist of names and quality scores. ad, Tissue and cell types grouped by typeof biological material (a), anatomical location (b), reference epigenomeidentifier (EID, c) and abbreviated name (d). PB, peripheral blood. ENCODE2012 reference epigenomes are shown separately. eg, Normalized strandcross-correlation quality scores (NSC)37 for the core set of five histonemarks (e), additional acetylation marks (f) and DNase-seq (g). h, Methylationdata by WGBS (red), RRBS (blue) and mCRF (green). A total of 104methylation data sets available in 95 distinct reference epigenomes. i, Geneexpression data using RNA-seq (brown) and microarray expression (yellow).j, A total of 26 epigenomes contain 184 additional histone modification marks.k, Sixty highest-quality epigenomes (purple) were used for training the corechromatin state model, which was then applied to the full set of epigenomes(purple and orange).
ARTICLE RESEARCH
1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 9
Macmillan Publishers Limited. All rights reserved2015
4 Reference Epigenome Mapping Centers (REMCs) 111 reference human epigenomes from primary cells and tissues 2,805 datasets: 1,821 histone modifications, 360 DNA accesibility, 277 DNA methylation, and 166 RNA-seq 150.21 billion (uniquely) mapped sequencing reads 3,174x coverage of hg19RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
We computed several quality control measures (Fig. 2 and Supplemen-tary Table 1) including the number of distinct uniquely mapped reads;the fraction of mapped reads overlapping areas of enrichment18,36;
genome-wide strand cross-correlation37 (Fig. 2eg); inter-replicatecorrelation; multidimensional scaling of data sets from different pro-duction centres (Supplementary Fig. 1); correlation across pairs of datasets (Extended Data Fig. 1e); consistency between assays carried out inmultiple mapping centres (Supplementary Table 2); read mapping qua-lity for bisulfite-treated reads38,39; and agreement with imputed data40.Outlier data sets were flagged, removed or replaced, and lower-coveragedata sets were combined where possible (see Methods).
The resulting data sets provide global views of the epigenomic land-scape in a wide range of human cell and tissue types (Fig. 3), includingthe largest and most diverse collection to date of chromatin state anno-tations (Fig. 3a); some of the deepest surveys of individual cell typesusing diverse epigenomic assays (with 2131 distinct epigenomic marksfor seven deeply profiled epigenomes; Fig. 3b); and some of the broad-est surveys of individual epigenomic marks across multiple cell types(Fig. 3c). These data sets enable genome-wide epigenomic analyses acrossmultiple dimensions (Fig. 3d). All data sets, standards and protocolsare publicly available from web portals, linked from the main consor-tium homepage http://www.roadmapepigenomics.org, and also at http://compbio.mit.edu/roadmap.
Chromatin states, DNA methylation and DNA accessibilityAs a foundation for integrative analysis, we used a common set of com-binatorial chromatin states41 across all 111 epigenomes, plus 16 addi-tional epigenomes generated by the ENCODE project (127 epigenomesin total), using the core set of five histone modification marks that werecommon to all. We trained a 15-state model (Fig. 4a, b and Supplemen-tary Table 3a) consisting of 8 active states and 7 repressed states (Fig. 4c)that were recurrently recovered (Extended Data Fig. 2a), and showeddistinct levels of DNA methylation (Fig. 4d), DNA accessibility (Fig. 4e),regulator binding (Extended Data Fig. 2b and Supplementary Fig. 2)and evolutionary conservation (Fig. 4f and Supplementary Fig. 3). Theactive states (associated with expressed genes) consist of active tran-scription start site (TSS) proximal promoter states (TssA, TssAFlnk), atranscribed state at the 59 and 39 end of genes showing both promoterand enhancer signatures (TxFlnk), actively transcribed states (Tx, TxWk),enhancer states (Enh, EnhG), and a state associated with zinc finger proteingenes (ZNF/Rpts). The inactive states consist of constitutive hetero-chromatin (Het), bivalent regulatory states (TssBiv, BivFlnk, EnhBiv),repressed Polycomb states (ReprPC, ReprPCWk), and a quiescent state(Quies), which covered on average 68% of each reference epigenome.Enhancer and promoter states covered approximately 5% of each referenceepigenome on average, and showed enrichment for evolutionarily con-served non-exonic regions42.
To capture the greater complexity afforded by additional marks, wetrained additional chromatin state models in subsets of cell types. Inthe subset of 98 reference epigenomes that also included H3K27ac data,we also learned an 18-state model (Extended Data Fig. 2c and Supplemen-tary Table 3b), enabling us to distinguish enhancer states containingstrong H3K27ac signal (EnhA1, EnhA2), which showed higher DNA
ES cell derivedPrimary cellPrim. tissue
Prim. culture
Cell line
IMR90
Myosat.
Adipose
Epithelial
Mesench.
B cell
T cell
ES-deriv.
iPSC
ES cell
Neurosph.Thymus
Other
Digestive
Heart
Smoothmuscle
Muscle
Brain
Data set count
EID Epigenome name
Cell type/tissuegroup
Blood &
HSC &
IMR90 fetal lung fibroblastsES-WA7 cellsH9 cellsES-I3 cellsHUES6 cellsHUES48 cellsHUES64 cellsH1 cellsES-UCSF4 cellsiPS-20b cellsiPS-18 cellsiPS-15b cellsiPS DF 6.9 cellsiPS DF 19.11 cellsH1 derived neuronal progenitor cultured cellsH9 derived neuronal progenitor cultured cellsH9 derived neuron cultured cellsHUES64 derived CD56+ mesodermHUES64 derived CD56+ ectodermHUES64 derived CD184+ endodermH1 BMP4 derived mesendodermH1 BMP4 derived trophoblastH1 derived mesenchymal stem cellsPrimary mononuclear cells (from PB)Primary T cells from primary blood (from PB)Primary T cells effector/memory enriched (PB)Primary T cells from cord bloodPrimary T regulatory cells (from PB)Primary T helper cells (from PB)Primary T helper naive cells (from PB)Primary T helper cells PMA-I stimulatedPrimary T helper 17 cells PMA-I stimulatedPrimary T helper memory cells (from PB)Primary T helper memory cells (from PB)Primary T CD8+ memory cells (from PB)Primary T helper naive cells (from PB)Primary T CD8+ naive cells (from PB)Primary monocytes (from PB)Primary B cells from cord bloodPrimary haematopoietic stem cells (HSCs)Primary HSCs G-CSF-mobilized malePrimary HSCs G-CSF-mobilized femalePrimary HSCs short term culturePrimary B cells (from PB)Primary natural killer cells (from PB)Primary neutrophils (from PB)Bone marrow derived MSCsMesenchymal stem cell deriv. chondrocyteAdipose-derived mesenchymal stem cellsMesenchymal stem cell derived adipocyteMuscle satelliteForeskin fibroblastForeskin fibroblastForeskin melanocyteForeskin melanocyteForeskin keratinocyteForeskin keratinocyteBreast vHMEC mammary epithelialBreast myoepithelialGanglion eminence derived neurospheresCortex derived neurospheresThymusFetal thymusBrain hippocampus middleBrain substantia nigraBrain anterior caudateBrain cingulate gyrusBrain inferior temporal lobeBrain angular gyrusBrain dorsolateral prefrontal cortexBrain germinal matrixFetal brain femaleFetal brain maleAdipose nucleiPsoas muscleSkeletal muscle femaleSkeletal muscle maleFetal muscle trunkFetal muscle legFetal heartRight atriumLeft ventricleRight ventricleAortaDuodenum smooth muscleColon smooth muscleRectal smooth muscleStomach smooth muscleFetal stomachFetal intestine smallFetal intestine largeSmall intestineSigmoid colonColonic mucosaRectal mucosa donor 29Rectal mucosa donor 31Stomach mucosaDuodenum mucosaOesophagusGastricPlacenta amnionFetal kidneyFetal lungOvaryPancreatic isletsFetal adrenal glandPlacentaLiverPancreasLungSpleenA549 EtOH 0.02pct lung carcinomaDnd41 T cell leukaemiaGM12878 lymphoblastoidHeLa-S3 cervical carcinomaHepG2 hepatocellular carcinomaHMEC mammary epithelialHSMM skeletal muscle myoblastsHSMM-derived skeletal muscle myotubesHUVEC umbilical vein endothelialK562 leukaemiaMonocytes-CD14+ RO01746NH-A astrocyteNHDF-ad adult dermal fibroblastNHEK-epidermal keratinocyteNHLF lung fibroblastOsteoblast
Prim
ary
cultu
res
ES c
ell d
eriv
edP
rimar
y ce
llsP
rimar
y tis
sues
E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129
127
127
127
127
127 98 62 53 95 78 184
127
H3K
4me1
H3K
4me3
H3K
36m
e3H
3K27
me3
H3K
9me3
H3K
27ac
H3K
9ac
DN
ase-
Seq
DN
A m
ethy
l
Add
tl m
arks
Chr
om. s
tate
s
Sam
ple
typ e
21
21
20
1311
1511
13
1
4444444455444544
ENCODE2012
Gen
e ex
pr.
Prim
ary
cultu
res
a b c e f g h i j kd
WGBS(n = 37)RRBS(n = 51)mCRF(n = 16)
Highest-quality epigenomes (n = 60)(ChromHMM model trained + applied)Remaining epigenomes (n = 67)(ChromHMM model only applied)
RNA-seq(n = 56)Microarray(n = 22)
50%
100%
0% Expr
essi
on
Qua
lity
Sig
nal-t
o-no
ise
ratio
per
cent
ile
Sam
ple
type
DN
A m
ethy
latio
n
Figure 2 | Data sets available for each reference epigenome. List of 127epigenomes including 111 by the Roadmap Epigenomics program (E001E113) and 16 by ENCODE (E114E129). See Supplementary Table 1 for a fulllist of names and quality scores. ad, Tissue and cell types grouped by typeof biological material (a), anatomical location (b), reference epigenomeidentifier (EID, c) and abbreviated name (d). PB, peripheral blood. ENCODE2012 reference epigenomes are shown separately. eg, Normalized strandcross-correlation quality scores (NSC)37 for the core set of five histonemarks (e), additional acetylation marks (f) and DNase-seq (g). h, Methylationdata by WGBS (red), RRBS (blue) and mCRF (green). A total of 104methylation data sets available in 95 distinct reference epigenomes. i, Geneexpression data using RNA-seq (brown) and microarray expression (yellow).j, A total of 26 epigenomes contain 184 additional histone modification marks.k, Sixty highest-quality epigenomes (purple) were used for training the corechromatin state model, which was then applied to the full set of epigenomes(purple and orange).
ARTICLE RESEARCH
1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 9
Macmillan Publishers Limited. All rights reserved2015
1,936 datasets, 111 epigenomes (+16 ENCODE) !Core set: H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3 (Fig 2e) !Additional acetylation marks: H3K27ac, H3K9ac (Fig 2f) !Chromatin accessibility: DNase-seq (Fig 2g) !Methylation data: WGBS (red), RRBS (blue) and mCRF (green) (Fig 2h) !Gene expression: RNA-seq (brown) and microarray expression (yellow) (Fig 2i) !Deep set: 16 histone modification marks (on average) across 7 cell types (Fig 2j) !Fig 2h: Sixty highest-quality epigenomes (purple) were used for training the core chromatin state model, which was then applied to the full set of epigenomes (purple and orange).
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
EpigenomeClassesasdefinedbytheRoadmap
http://www.ncbi.nlm.nih.gov/books/NBK45786/#epi_help_doc.About_Data_Sources
Class1Epigenomes DNAmethylation(wholegenomebisulfitesequencing);corehistonemodificationsandanexpandedsetofhistonemodifications;RNAsequencingdata(RNA-seq);Chromatinaccessibility
Class2Epigenomes DNAmethylation(wholegenomebisulfitesequencing);corehistonemodifications;RNAsequencingdata(RNA-seq);Chromatinaccessibility
Class3Epigenomes DNAmethylation(RRBS,MeDIP-seq,MRE-seq);corehistonemodifications;RNAsequencingdata(geneexpressionmicroarray);Chromatinaccessibility(ifpossible)
Class4Epigenomes DNAmethylation(RRBS,MeDIP-seq,MRE-seq);corehistonemodifications;RNAsequencingdata(geneexpressionmicroarray)
http://www.ncbi.nlm.nih.gov/books/NBK45786/#epi_help_doc.About_Data_Sources
Epigenomicinformationacrosstissuesandmarks
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input
track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.
We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).
Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.
Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46
and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.
Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);
Chr
omat
in s
tate
s
H3K4me1
DNase
WGBS
H3K4me3
RNA-seq
a
b
c d
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
RefSeq genesChrom. states
RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1
DNaseDGFInput
H3K4me3H3K9ac
H3K56acH2A.Z
H2AK9acH2BK5acH3K4me2H3K18ac
H3K4me1H3K27acH4K5acH4K8acH3K4ac
H3K14acH3K23acH2AK5acH4K91ac
H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3H3K9me3
WGBSHi-C
E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129
Epithelial
Mesench.B cell
T cell
ES-deriv.
iPSC
ES cellIMR90
Other
Digestive
HeartSm. musc.
Muscle
Brain
Blood &
HSC &
ENCODE2012
H3K
4me1
semonegi pe ecner ef er 721
DN
ase
WG
BS
H3K
4me3
RN
A-se
q
33 data
sets in I
MR90 l
ung fibr
oblasts
Genome-wide measurements for all marks
IMR
90 fe
tal l
ung fib
robl
asts
Indi
vidu
al m
ark
data
set
s ac
ross
epi
geno
mes
Chr
omat
in s
tate
ann
otat
ions
in 1
27 e
pige
nom
es
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.
RESEARCH ARTICLE
3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5
Macmillan Publishers Limited. All rights reserved2015
Epigenomicinformationacrosstissuesandmarks
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input
track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.
We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).
Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.
Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46
and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.
Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);
Chr
omat
in s
tate
s
H3K4me1
DNase
WGBS
H3K4me3
RNA-seq
a
b
c d
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
RefSeq genesChrom. states
RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1
DNaseDGFInput
H3K4me3H3K9ac
H3K56acH2A.Z
H2AK9acH2BK5acH3K4me2H3K18ac
H3K4me1H3K27acH4K5acH4K8acH3K4ac
H3K14acH3K23acH2AK5acH4K91ac
H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3
H3K9me3WGBS
Hi-C
E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129
Epithelial
Mesench.B cell
T cell
ES-deriv.
iPSC
ES cellIMR90
Other
Digestive
HeartSm. musc.
Muscle
Brain
Blood &
HSC &
ENCODE2012
H3K
4me1
semonegi pe ecner ef er 721
DN
ase
WG
BS
H3K
4me3
RN
A-se
q
33 data
sets in I
MR90 l
ung fibr
oblasts
Genome-wide measurements for all marks
IMR
90 fe
tal l
ung fib
robl
asts
Indi
vidu
al m
ark
data
set
s ac
ross
epi
geno
mes
Chr
omat
in s
tate
ann
otat
ions
in 1
27 e
pige
nom
es
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.
RESEARCH ARTICLE
3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5
Macmillan Publishers Limited. All rights reserved2015
Epigenomicinformationacrosstissuesandmarks
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input
track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.
We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).
Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.
Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46
and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.
Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);
Chr
omat
in s
tate
sH3K4me1
DNase
WGBS
H3K4me3
RNA-seq
a
b
c d
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
RefSeq genesChrom. states
RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1
DNaseDGFInput
H3K4me3H3K9ac
H3K56acH2A.Z
H2AK9acH2BK5acH3K4me2H3K18ac
H3K4me1H3K27acH4K5acH4K8acH3K4ac
H3K14acH3K23acH2AK5acH4K91ac
H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3H3K9me3
WGBSHi-C
E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129
Epithelial
Mesench.B cell
T cell
ES-deriv.
iPSC
ES cellIMR90
Other
Digestive
HeartSm. musc.
Muscle
Brain
Blood &
HSC &
ENCODE2012
H3K
4me1
semonegi pe ecner ef er 721
DN
ase
WG
BS
H3K
4me3
RN
A-se
q
33 data
sets in I
MR90 l
ung fibr
oblasts
Genome-wide measurements for all marks
IMR
90 fe
tal l
ung fib
robl
asts
Indi
vidu
al m
ark
data
set
s ac
ross
epi
geno
mes
Chr
omat
in s
tate
ann
otat
ions
in 1
27 e
pige
nom
es
FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1
Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.
RESEARCH ARTICLE
3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5
Macmillan Publishers Limited. All rights reserved2015
Epigenomicinformationacrosstissuesandmarks
RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248
accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input
track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.
We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).
Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.
Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46
and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.
Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);
Chr
omat
in s
tate
s
H3K4me1
DNase
WGBS
H3K4me3
RNA-seq
a
b
c d
FAM205B ATP8B5P SIT