73
Regulatory genomics and epigenomics in fly and human epigenomics in fly and human Manolis Kellis, MIT MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of MIT and Harvard

Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Regulatory genomics and epigenomics in fly and humanepigenomics in fly and human

Manolis Kellis, MIT

MIT Computer Science & Artificial Intelligence LaboratoryBroad Institute of MIT and Harvard

Page 2: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Three areas of computational genomics

1. Genome annotationEvolutionary signatures for each functionDiscover proteins, RNAs, microRNAsNew bio: read-thru, editing, miR*, miR-AS

2. Gene regulationDiscover regulatory motifs pre/post-transcr.Id tif t t i iIdentify gene targets using comp. genomicsEpigenomics in development and disease

3. Genome evolutionEvolution by whole-genome duplicationThe two forces of gene evolutionThe two forces of gene evolutionPhylogenomics and neofunctionalization

Page 3: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAATCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGTCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGCACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCATGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATATCCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTTGGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAATAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCACCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGTTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAGGCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGAAATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGCGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACACAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCCACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT

Page 4: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAATCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGGTCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGCACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCATGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATATCCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTTGGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAATAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCACCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG

Genes

Encodeproteins

Regulatory motifs

ControlCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGTTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAGGCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGAAATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGCGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC

proteins gene expression

ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACACAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCCACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT

Page 5: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Large-scale comparative genomics datasets32 mammals 17 fungi12 flies32 mammals 17 fungies

catio

nPo

st-d

uplic

9 Yeasts

Pre-

dup

loid

P

P

P

N

8 Candida

Dip

l

P

P

P

N

Hap

loid

P

P

N

Page 6: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparative genomics and evolutionary signatures

• Comparative genomics can reveal functional elementsF l d l d t hi k fi h– For example: exons are deeply conserved to mouse, chicken, fish

– Many other elements are also strongly conserved: exons / regulatory?

• Can we also pinpoint specific functions of each region? Yes!– Patterns of change distinguish different types of functional elements– Specific function Selective pressures Patterns of mutation/inse/del

• Develop evolutionary signatures characteristic of each function

Page 7: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evolutionary signatures for diverse functionsProtein-coding genes- Codon Substitution Frequencies- Reading Frame Conservation

RNA structures- Compensatory changes- Silent G-U substitutions

microRNAs- Shape of conservation profile- Structural features: loops, pairs- Relationship with 3’UTR motifs- Relationship with 3 UTR motifs

Regulatory motifs- Mutations preserve consensus- Mutations preserve consensus- Increased Branch Length Score- Genome-wide conservation

Page 8: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Protein-coding genesProtein-coding genes

Mike Lin

Page 9: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evolutionary signatures for protein-coding genes

Non-synonymous substitutions

Synonymous codon substitutions Frame-shifting gapsGaps are multiples of 3

• Same conservation levels, distinct patterns of divergence– Gaps are multiples of three (preserve amino acid translation)

M i l l 3 i di ( il d b i i )

Synonymous codon substitutions a e s t g gaps

– Mutations are largely 3-periodic (silent codon substitutions)– Specific triplets exchanged more frequently (conservative substs.)– Conservation boundaries are sharp (pinpoint individual splicing signals)

Page 10: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Protein-coding evolution vs nucleotide conservation

Protein-coding exonsHighly conserved non-coding elements

• Evolutionary signatures specific to each function– Distinguish protein-coding from non-coding conservation

G id (CSF l ) 81% 91% i i– Genome-wide run (CSF only): 81% sens., 91% precision– Incorporating additional signatures: RFC, single-species…

Page 11: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Additional support for novel human exons

• Length distribution matches known exons • Supported by high-quality cDNA evidence(no excess of very small lengths)

• Supported by independent curation efforts • Extraordinary comparative support in some cases

Page 12: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Many new genes confirmed by chromatin domains

Missedexon

Alt.splicedexon

• Several hundred new exons, many in clustersExample: MM14qC3

Mikk l t lMikkelsen et al

• Supported by chromatin signatures (Guttman et al)

Page 13: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Genome-wide curation / experimental follow-up

GG PI: Tim Hubbard, Sanger Center.HAVANA curators, experimental validation.

• Novel candidate genes and exons– Experimental cDNA sequencing and validationExperimental cDNA sequencing and validation– Curation of gene structures integrating evidence

• Revising existing annotationsg g– Identify dubious genes with non-protein-like evolution– Refine boundaries and exon sets of existing genes– Curation: evaluate evidence supporting that annotation

• Unusual gene structuresE l i id i b f i i l– Evolutionary evidence in absence of primary signals

– Reveal new and unusual biological mechanisms

Page 14: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparative evidence leads to genome reannotation

• 81% of 928 curated exons incorporated in FlyBase

• Surprise: 44% of intronic predictions independent of the surrounding gene

• Surprise: 42% of intergenic predictions extend existing

• Surprise: overlapping protein-coding exons on

genes

• Surprise: 414 rejectedt i di

protein coding exons on opposite strands

protein-coding genes(two are pre-miRNAs)

Lin et al, Genome Research 2007

Page 15: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Unusual protein-coding eventsUnusual protein-coding events

Mike Lin

Page 16: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

When primary sequence signals are ignored

• Typical gene (MEF2A). Evolutionary signal stops at the stop codon.

• Unusual gene (GPX2). Protein-coding signal continues past the stop.• GPX2 is a known selenoprotein! Additional candidates found.

Page 17: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Translational read-through in neuronal proteinsNovel candidate: OPRL1 neurotransmitterNovel candidate: OPRL1 neurotransmitter

Protein-codingconservation

Continued protein-codingconservation

No moreconservation

Stop codonread through

2nd stopcodon

• New mechanism of post-transcriptional control.– Conserved in both mammals (~5 candidates) and flies (~150 candidates)– Strongly enriched for neurotransmitters and brain-expressed proteins– Read-through stop codon (&surrounding) shows increased conservation

• Many questions remain– Role of editing? Cryptic splice sites? RNA secondary structure?g yp p y

Lin et al, Genome Research 2007

Page 18: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Measuring excess constraint within protein-coding exons

Typical protein-coding exon (Numerous mutations, at each column)

Excess-conservation exon: conserved above and beyond the call of dutyLikely to have additional functions, overlapping selective pressures

Page 19: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Searching for excess-constraint coding sequence(1) Build a model for expected substitution counts( ) u d a ode o e pected subst tut o cou ts

Syn.subs. correlate w/ degeneracy & CpG Distribution for each ancestral codon

(2) Score windows for depletion in syn. subst.• Z-score: P(obs subst | expected for each codon)

(3) Top candidate exons with excess constraint

• Z-score: P(obs. subst | expected for each codon)

( ) p• PCPB2: derived from ancestral transposon• Hox B5 gene start: 52 AA before 1 syn.subst• C6orf111: predicted ORF on chr. 6• EIF4G2: overlaps spliced EvoFold prediction

Page 20: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Examples: Top candidate exons showing increased selection

• HoxB5: 52 amino acids before the first synonymous substitutionO l hi hl d RNA d t t• Overlaps highly conserved RNA secondary structure

• C6orf11: Predicted ORF, protein-coding, extremely conserved

• EIF4G2: Several consecutive exons, conserved RNA struct.

Page 21: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evidence of post-transcriptional RNA regulation

• New non-coding RNAs (introns / intergenic)– Supported by independent expression in multiple tissues

• Roles in translational regulation (exons / 5’UTRs)Roles in translational regulation (exons / 5 UTRs)– Difficult to obtain experimentally: importance of evo. signal– Role in translation initiation: overlap ATG, ribosomal proteins

• Roles in A to I editing (exon & intron pairing)• Roles in A-to-I editing (exon & intron pairing)– Enriched in known ADAR targets, new editing candidates, cDNA support

• Roles in localization / targeting (3’UTRs)P i il di t d (75% & 80%)– Primarily on coding strand (75% & 80%)

– Enriched in post-transcriptional regulators: feedback, auto-/cross-regulation

Jakob Pedersen, EvoFold. Stark et al, Nature 2007

Page 22: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

microRNA genesmicroRNA genes

Alex StarkPouya Kheradpour

Page 23: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evolutionary signatures for microRNA genes

(1) Conservation profile

miRNAs show characteristic conservation properties

Page 24: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Distinguishing true miRNAs from random hairpinsEvolutionary features F t fEvolutionary features Feature performance

EnrichmentTotal

(1)

(2)

(3)

Structural features

(4)

Combination of features:(5)

Combination of features:> 4,500-fold enrichment(6)

Stark et al, Genome Research 2007

Page 25: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Novel miRNAs validated by sequencing reads

7al

GR

200

77.

Rub

y et

a

348 reads h (G

R) 2

007

Ruby, Bartel, Lai

348 reads16 reads

e R

esea

rch

y, ,• In fly genome: 101 hairpins above 0.95 cutoff

60 of 74 (81%) known Rfam miRNAs rediscovered+ 24 novel expression-validated by 454&Solexa (Bartel/Hannon)

al, G

enom

e

+ 17 additional candidates show diverse evidence of function• In mammals: combine experimental & evolutionary info

Rely on reads for discovery, use evolutionary signal to study function Star

k et

a

Page 26: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Surprise 1: microRNA & microRNA* function

Drosophila Hox

• Both hairpin arms of a microRNA can be functionalHi h b d t i d t t

Drosophila Hox

– High scores, abundant processing, conserved targets– Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators

Stark et al, Genome Research 2007

Page 27: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Surprise 2: microRNA-anti-sense functionHighly conserved Hox targets

senseanti-

sense

ent 2

007

Dev

elop

me

al, G

enes

&D

• A single miRNA locus transcribed from both strands• The two transcripts show distinct expression domains (mutually exclusive)• Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) St

ark

et a

Page 28: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

miR-iab-4AS leads to homeotic transformationswing

n

haltereSensory bristlesw/bristles

gnifi

catio

n

halterewing

wing

WT

sam

e m

age:

C,D

,E s

• Mis-expression of mir-iab-4S & AS:

sense Antisense Not

e

palteres wings homeotic transform.

• Stronger phenotype for AS miRNA• Sense/anti-sense pairs as generalSense/anti sense pairs as general

building blocks for miRNA regulation• 10 sense/anti-sense miRNAs in mouse

Stark et al, Genes&Development 2007

Page 29: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Surprise 2: MicroRNAs and developmental control

• Illustrates miR/miR* and miR/miR-AS cooperation

Page 30: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Measuring selectionMeasuring selection

Michele ClampManuel Garber

Xiaohui Xie

Page 31: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

More mammals = more power 

4 Species (branch length1 subs/site)

26 Species(branch length 4 subs/site)( g / )

6bp50bp  6bp 

Page 32: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparative genomics of 29 eutherian mammals

(22  @ 2X coverage)

Page 33: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Detecting Purifying Selection (ω)

ωNeutral sequence Constrained sequenceω

Estimating intensity of constraint (ω):g y ( )• Probabilistic evolutionary model• Maximum Likelihood (ML) estimation of ω

- sitewise (evaluate every k-long window)- sitewise (evaluate every k-long window)- windows-based (increased power)

• Reports ω, and its log odds score (LODS).2• Theoretical p-value (LODS distributes χ2 with df = 1)

Manuel Garber, Michele Clamp, Xiaohui Xie

Page 34: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Detecting unusual mutational patterns (π)ω 0 0 0 8 0 5 0 6 3 2 0 0ω 0 0 0.8 0.5 0.6 3.2 0 0

• Repeated C G transversion

• Has happened at least 4 times.

• Very unlikely given neutral model.

π

• Goal: Identify sites with unlikely substitution pattern.• Approach: Probabilistic method to detect a

stationary distribution that is different from background.• Solution: Implement ML estimator (π) of this vector:

• Provides a Position Weight Matrix for any given k-mer in the genome.g y g g• Scores every base in the genome (LODS).

Manuel Garber, Michele Clamp, Xiaohui Xie

Page 35: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Using 29 mammals : ~7% of the b dgenome appears to be constrained

60% of selected bases can be

pinpointed to within5 species

pinpointed to within 12 bp at 95% confidence.

We can detect 127Mb (of the

estimated 210Mb)

20 species

Constraint score

Page 36: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

How many of the selected bases have we not b f ?seen before?

29 mammals4 mammals (Old) 29 mammals127Mb

2,600,000 elements

4 mammals (Old)60Mb

500,000 elements

3Mb 69Mb (New)58Mb3Mb

77,000

69Mb (New)

1,970,000

58Mb

374,000

30% low alignment coverage.

Some are longer gelements with weak conservation

70% of exons 20% of exons

Page 37: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

New elements primarily intergenic

Exons old newOld and new elements are non-coding only

Page 38: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Individual binding sites are revealed

5’ 

Chr 6 - HIST13A 120bp upstream

29mammal elements29mammal elements

29mammal constraint- 12mers

Alignment information content

Transfac matches

CAAT CAAT TATA

Over 20 histones show similar conservation patterns

Page 39: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Understanding genomes by their chromatin signatures

Jason Ernst

Page 40: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Combining evolutionary and chromatin signatures

• Evolutionary signatures & genome annotation• Evolutionary signatures & genome annotation– Protein-coding genes + unusual gene structures– miRNA gene discovery and characterizationmiRNA gene discovery and characterization– Functional anti-sense miRNAs and miRNA* arms– Motif discovery + motif functiony

• Chromatin signatures & genome regulation– De novo discovery of chromatin signatures

• New functional combinations of marks emerge• A new class of long non-coding RNA regulators

– Regulatory motif and target prediction– Regulatory motif and target prediction• Drosophila developmental enhancers• Human enhancer-specific motifs

• Dynamics across developmental time

Page 41: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Chromatin signatures for genome annotation

• Similarly to evolutionary signatures:

h ti i t d– chromatin signatures encode functional elements

• The difference– this information is dynamic

• The epigenetic code hypothesishypothesis– Distinct combinations of marks

encode distinct chromatin states

• Can we discover them de novo

Page 42: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Cartoon Illustration of the Model

EnhancerTranscription Start Site DNATranscribed Region

ObservedHistone Modifications

Most likely Hidden State 1 2 5 63 4 5 5 5 5 6 6

1: 4:

Even though modification was not observed can still infer correct state based on neighboring locations that this 

Highly Likely Modifications in State

2

0.8

0 9

0.80.7.8

3:

5:

6:

state is likely of the same type as its neighboring states

2: 0.9

0.9 0.8

0.9

Page 43: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand
Page 44: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Core Promoter States

State Enriched Category Fold Corrected p-value

State 2 tRNA Metabolic 4 4 0 003State 2 tRNA Metabolic 4.4 0.003

State 3 Cell Cycle 2.7 2x10-7

State 4 Embryonic Development

2.8 9x10-23

State 5 Chromatin 2.2 2x10-7

State 6 Response to DNA Damage Stimulus

2.1 2x10-10

State 7 RNA Processing 2.6 9x10-24

State 8 T-cell Activation 4.7 3x10-7

Different promoter states show

Distance to TSS

Different promoter states show distinct functional enrichment

Page 45: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparing chromatin states across cell typesNHEKK562 HUVEC

K562

Proportion of genome

Pairwise state fold 

enrichments

UVEC CTCF island state (State 9) 

h hl bl llHU highly stable across cell types

NHEK

Page 46: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparing chromatin states across cell typesNHEKK562 HUVEC

GO C t P l

K562

GO Category P-valuebiopolymer metabolic process 1.60E‐120

cellular biopolymer metabolic process 6.60E‐120

cellular metabolic process 8.60E‐120

UVEC

cellular macromolecule metabolic process 5.30E‐119

macromolecule metabolic process 5.50E‐119

primary metabolic process 3.40E‐115

nucleic acid binding 1.10E‐105

HU

RNA processing 2.20E‐99nucleobase, nucleoside, nucleotide and nucleic acid metabolic process 5.10E‐93

Top GO Enrichment for TSS in Active promoter state (1) in

NHEK

Active promoter state (1) in NHEK and HUVEC

NHEK, HUVEC

46

Page 47: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparing chromatin states across cell typesNHEKK562 HUVEC

K562

GO Category P value

UVEC

GO Category P-value

olfactory receptor activity 3.70E‐201

sensory perception of smell 7.30E‐175

HU

sensory perception of chemical stimulus 1.60E‐170

Top GO Enrichment for TSS in unmodified state (7) in NHEK

NHEK

unmodified state (7) in NHEK and HUVEC

47NHEK, HUVEC

Page 48: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparing chromatin states across cell typesNHEKK562 HUVEC

K562

GO Category P-value

ectoderm development 2.90E‐09

UVEC

epidermis development 1.80E‐08

keratinocyte differentiation 3.00E‐06

tissue development 3.20E‐06

HU

cell adhesion 1.90E‐05

GO Enrichment for TSS in Active promoter state (1)

NHEK

Active promoter state (1) in NHEK and unmodified

state (7) in HUVEC

NHEK

48HUVEC

Page 49: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Comparing chromatin states across cell typesNHEKK562 HUVEC

K562

GO Category P-value

blood vessel development 2.60E‐05

UVEC

vasculature development 3.00E‐05

angiogenesis 3.50E‐05

blood vessel morphogenesis 1.20E‐04

HU

GO Enrichment for TSS in Active promoter state (1)

NHEK

Active promoter state (1) in HUVEC and unmodified

state (7) in NHEK

HUVEC

49NHEK

Page 50: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Striking example: ~2,000 Large intergenic non‐coding RNAs (lincRNAs)

Our experiments confirm:

H3K4me3 - K3K36me3

Our experiments confirm:• These regions produce

RNA molecules

• They have exon/intron structures

1 They are evolutionarily1. They are evolutionarily conserved

2. They show no coding t ti l ipotential, no evo. sign.

3. Their promoters and regulation are conserved

Mikkelsen et al. 2007

g

4. They play diverse roles in chromatin regulationGuttman et al. Nature, Feb 2009

Page 51: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Combine chromatin signatures and regulatory motifsNew developmental enhancers in human and fly

Zeitlinger et al, Genes & Development 2007Visel, Penacchio, Rubin, Ren, Nature 2008

• Chromatin signatures and evolutionary signature are predictive of enhancer elements

Heinzman, et al, Kellis, Ren, Nature 2008 Zeitlinger et al, Nature Genetics 2007

• Experimental techniques developed for inferring expression domains in human and fly

• Large-scale databases mapping every elements to its expression pattern emerge

• Ability to test new patterns and artificial elements in fly / mouse embryos

Page 52: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Regulatory motif discovery and functionRegulatory motif discovery and function

Pouya KheradpourAlex Stark

Page 53: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evolutionary signatures for regulatory motifs

Known

5’-UTR 3’-UTR

enhancers promoters exons 3’-UTRsintronsD.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTCD.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC

engrailedsite(footprint)

D.sim CAGCT AGCC AACTCTCTAATTAGCGACTAAGTC CAAGTCD.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTCD.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTCD.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTCD.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG

D.mel

D. ere

** * * *********** * **** * **

D. ana

D. pse.

• Individual motif instances are preferentially conservedIndividual motif instances are preferentially conserved• Measure conservation across entire genome

– Over thousands of motif instances Increased discovery powerC l t id ti d id t i h– Couple to rapid enumeration and rapid string search

De novo discovery of regulatory motifs Kellis el al, Nature 2003Xie et al. Nature 2005

Stark et al, Nature 2007

Page 54: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Consensus MCS Matches to known Tissue specific target expression Promoters Enhancers1 CTAATTAAA 65.6 engrailed (en) 25.4 2

Power of evolutionary signatures for motif discovery

2 TTKCAATTAA 57.3 reversed-polarity (repo) 5.8 4.23 WATTRATTK 54.9 araucan (ara) 11.7 2.64 AAATTTATGCK 54.4 paired (prd) 4.5 16.55 GCAATAAA 51 ventral veins lacking (vvl) 13.2 0.36 DTAATTTRYNR 46.7 Ultrabithorax (Ubx) 16 3.37 TGATTAAT 45.7 apterous (ap) 7.1 1.77 TGATTAAT 45.7 apterous (ap) 7.1 1.78 YMATTAAAA 43.1 abdominal A (abd-A) 7 2.29 AAACNNGTT 41.2 20.1 4.3

10 RATTKAATT 40 3.9 0.711 GCACGTGT 39.5 fushi tarazu (ftz) 17.912 AACASCTG 38.8 broad-Z3 (br-Z3) 10.713 AATTRMATTA 38 2 19 5 1 213 AATTRMATTA 38.2 19.5 1.214 TATGCWAAT 37.8 5.8 215 TAATTATG 37.5 Antennapedia (Antp) 14.1 5.416 CATNAATCA 36.9 1.8 1.717 TTACATAA 36.9 5.418 RTAAATCAA 36.3 3.2 2.819 AATKNMATTT 36 3.6 020 ATGTCAAHT 35.6 2.4 4.621 ATAAAYAAA 35.5 57.2 -0.522 YYAATCAAA 33.9 5.3 0.623 WTTTTATG 33.8 Abdominal B (Abd-B) 6.3 624 TTTYMATTA 33 6 t d ti l ( d) 6 7 1 724 TTTYMATTA 33.6 extradenticle (exd) 6.7 1.725 TGTMAATA 33.2 8.9 1.626 TAAYGAG 33.1 4.7 2.727 AAAKTGA 32.9 7.6 0.328 AAANNAAA 32.9 449.7 0.829 RTAAWTTAT 32.9 gooseberry-neuro (gsb-n) 11 0.830 TTATTTAYR 32.9 Deformed (Dfd) 30.7

Ability to discover full dictionary of regulatory motifs de novoStark et al, Nature, 2007

Page 55: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Diverse lines of evidence to characterize novel motifs1. Clustering of motif occurrences upstream of candidate genes

3. Positional constraintsCore promoter element(initiator)

2. Tissue enrichment and avoidance

Downstream promoterMotif avoidance patternselement

Depletion Typical regulator(transcription factor)

Functional clusters emerge

Page 56: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Recognizing functional motifs within coding exons• Challenge: overlapping selective pressures

• Solution: frame specific conservation

g pp g p– Distinguish RNA-level motifs from protein-level motifs– The two have distinct evolutionary characteristics

• Solution: frame-specific conservation– Evaluate each reading-frame offset separately– Motifs due to di-codon biases: only one frame– Motifs due to RNA-level selection: all three frames

• Result: miRNA motifs in coding exonsT 20 tif 11 iRNA d ( 11 i 200 )– Top 20 motifs 11 miRNA seeds (vs 11 in 200+)

• Conclusion: iRNA t ti i di– miRNA targeting in coding exons

– Specific selection for RNA function• Similar to 3’UTR targeting

Stark et al, Nature, 2007

– Conservation profile of 7-mers– Coding & 3’UTR show corr.>0.9

Page 57: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Sequence determinants of TF binding• Hundreds of proteins bind overlapping regions

R l t tif l i l ifi it CTCF, check• Regulatory motif analysis reveals sequence specificity• Basis for understanding motif combinations & grammars

GAF check

CTCF, check

GAF, check

Su(Hw) check

Example: insulator proteins in Drosophila

Su(Hw), check

BEAF-32, variant

CP190, novel

Mod(mdg4), novelAlthough insulator-bound regions overlap, each motif is specific to exactly one protein

Page 58: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Reliable target identificationReliable target identification

Pouya KheradpourAlex Stark

Page 59: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Evolutionary signatures of individual motif instances

All f tif t• Allow for motif movements– Sequencing/alignment errors

Loss movement divergence– Loss, movement, divergence• Measure branch-length score

– Sum evidence along branchesSum evidence along branches– Close species little contribution

BLS: 25% Mef2:YTAWWWWTAR BLS: 83%

Page 60: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Motif confidence selects functional instancesTranscription factor motifs

Confidence ConfidenceConfidence

microRNA motifs

ConfidenceIncreasing BLS

Increasing confidenceConfidence selects functional regions

Confidence selects in vivo bound sites

High sensitivity

microRNA motifs

Confidence selects positive strand

Increasing BLS Increasing confidence

Confidence selects functional regions

Kheradpour et al, Genome Research 2007

Page 61: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

ChIP vs. conservation: similar power / complement

2007

Zeitl

inge

r 2an

d 20

07, Z

man

n 20

06

Dat

a: S

andm

Amidst ChIP-bound regions: - Subset with conserved motifs: best

ChIP vs. conservation:- Similar functional enrichment

D

- Subset lacking cons. motifs: worseConservation selects relevant targets

- Even for motifs outside ChIPChIP-grade regulatory network

Kheradpour et al, Genome Research 2007

Page 62: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

• ChIP-grade quality

Initial regulatory network for an animal genome• ChIP-grade quality

– Similar functional enrichment

– High sens High specHigh sens. High spec.

• Systems-level81% of Transc Factors– 81% of Transc. Factors

– 86% of microRNAs– 8k + 2k targets

46k ti– 46k connections

• Lessons learned– Pre- and post- are

correlated (hihi/lolo)– Regulators are heavily

t t d f db k ltargeted, feedback loop

Kheradpour et al, Genome Research, 2007Sushmita Roy

Page 63: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Network captures literature-supported connections

Kheradpour et al, Genome Research, 2007Sushmita Roy

Page 64: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Network captures co-expression supported edges

Red = co-expressed 46% of edgesGrey = not co-expressedNamed = literature-supportedBold = literature-supported

46% of edges are supported (P=10-3)

Kheradpour et al, Genome Research 2007Sushmita Roy

Page 65: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Motif role in chromatin dynamicsMotif role in chromatin dynamics

Pouya Kheradpour

Page 66: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Motif dynamics in Drosophila development“New” regions

12 developmental time points

Stage t ‐ 1

g

Stage t

Stage t + 1

“Old” regions

8 antibodies + gene expressionH3K4me1 Enhancers For each of the antibodies and time H3K4me3 Promoters/enhancers

H3K27ac Activation

H3K9ac Activation

H3K27 3 R i

points, we define three types of regions:

1.“Bound”: all the regions boundH3K27me3 Repression

H3K9me3 Heterochromatin

Pol 2 Transcription/promoters

CBP HAT – Enhancers

2.“New”: regions that were not bound in the previous time, but now are

3 “Old” i h b d hKevin White, Nicolas Nègre, 

Parantu Shah, Carolyn Morrison

Total RNA Expression 3.“Old”: regions that are bound at the current time, but won’t be at the next

Page 67: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Examples of enrichment following expression

H3K27me3H3K27me3

• abd‐A motif is enriched in new H3K27me3 regions at L2– Coincides with a drop in the expression of abd‐A

Fold enrichment or over expression

Coincides with a drop in the expression of abd A– Model: sites gain H3K27me3 as abd‐A binding lost

• Additional intriguing stories found, to be explored

Page 68: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Motifs and chromatin dynamics in Human4 human cell types

NHEK

HUVECHUVEC Umbilical vein endothelial

NHEK Keratinocytes

GM12878 Lymphoblastoid

K562 Myelogenous leukemiaXX

??

GM12878

K562H3K4me1 Enhancers

H3K4me2 Promoters/enhancers

11 antibodies + gene expression

K562 Myelogenous leukemia

X?X

H3K4me2 Promoters/enhancers

H3K4me3 Poised/active promoters

H3K27ac Activation

H3K9ac Activation For each of the antibodies and cell types, 

UniqueBound Missing

H3K27me3 Repression

H3K9me1 Activation

H4K20me1 Activation

H3K36me3 Transcription

ypwe define three types of regions:

1.“Bound”: Bound in that cell typePol 2 Transcription/promoters

CTCF Insulators

RNA Expression

2.“Unique”: Bound only in that cell type

3.“Missing”: Bound in all other cell typesBrad Bernstein, Tarjei Mikkelsen, Mitch Guttman, Charles Epstein, Noam Shoresh

Page 69: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Example: NF‐κB a likely regulator of GM12878

Active marksRepressive mark

• The NF‐κB motif is enriched in H3K4me2 regions found uniquely in GM12878 cellsI i lik i i h d i h i l b d i f

Repressive mark

• It is likewise enriched in the uniquely bound regions for other active marks

• Conversely, it is enriched in the uniquely unbound regions f th i k H3K27 3

NF‐κB motif

Fold enrichment or over expression

for the repressive mark H3K27me3• We find that NF‐κB is also over expressed in GM12878, 

suggesting a causative explanation

Page 70: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Marks associated with activation

i i i

• By correlating the expression and enrichments of 

Activator association

y g pactivating factors, we can rank each chromatin mark by its “Activator association” 

• Correlation follows the expected trend– H3K4me2, H3K4me3, H3K9ac and H3K27ac associated with 

i iactivation

– H3K27me3 anti‐correlated with activation 

Page 71: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

The grand challenge aheadAnnotations & images for all expression patterns

Binding sites of everydevelopmental regulator

Sequence motifs forevery regulator

tral CTCF, check

orsa

l-Ven

t

GAF, check

rD

Expression domain primitives reveal underlying logic Su(Hw), check

r-P

oste

rior

BEAF-32, variant

Ant

erio

rCP190, novel

Mod(mdg4), novel

Understand regulatory logic specifying development

Page 72: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Summary: Regulatory genomics of flies and men• Evolutionary signaturesEvolutionary signatures

–Systematic annotation of proteins, RNAs, miRNAs, motifs–Reveal unusual genes, RNA structures, stop read-through

• Regulatory motifs–Distinct motif sets in promoter vs. enhancer regions–Unique signatures for exonic motifs miRNA targetsUnique signatures for exonic motifs miRNA targets

• Epigenomics–Fly: Integration of AP, DV developmental processes–Human: Enhancer signatures, enhancer-specific motifs

• microRNAs–Functionality of miRNA* and anti-sense miRNAs–Implications for Hox cluster regulation miR10, miR-iab-4

• TargetsTargets–Global, reliable identification of TF and miRNA targets–Biochemically-active&selectively-neutral=non-functional?

Page 73: Regulatory genomics and epigenomics in flyepigenomics in ...compbio.mit.edu/slides/188_MSR_Kellis.pdf · Regulatory genomics and epigenomics in flyepigenomics in fly and humanand

Acknowledgements

AlexStark

MikeLin

JasonErnst

JuliaPouya JuliaZeitlinger

PouyaKheradpour

12-flies Andy Clark, Mike Eisen, Bill Gelbart, Doug SmithReadthru FlyBase, Bill Gelbart, Robert ReenanmiRNAs Julius Brennecke, Graham Ruby, Greg Hannon, David Barteliab-4AS Natascha Bushati, Julius Brenneke, Steve Cohen, Greg HannonTF binding Julia Zeitlinger, Robert Zinzen, Mike Levine, Rick YoungENCODE Kevin White, Bing Ren, Jim Posakony, Brad Bernstein