Comparative genomics, ChIP-chip and transfections to find cis-regulatory modules Penn State...

Preview:

Citation preview

Comparative genomics, ChIP-chip and transfections to find

cis-regulatory modules

Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller, Francesca Chiaromonte, Ross Hardison

Children’s Hospital of Philadelphia: Mitch Weiss, Lou Dore

NimbleGen: Roland Green, Xinmin ZhangCold Spring Harbor, March 2007

What is conservation good for??

Ideal cases for interpretation by comparative genomics

Neutral DNASimilarity

Human vs mouse

Position along chromosome

DNA segments with a function common to divergent species.

DNA segments in which change is beneficial to at least one of the two species.

Negative selection(purifying)

P (not neutral)Neutral DNA

Similarity

Positive selection(adaptive)

Neutral DNA

Human vs rhesus

Putative transcriptional regulatory regions = pTRRs

• Antibodies vs 10 sequence-specific factors: – Sp1, Sp3, E2F1, E2F4, cMyc, STAT1, cJun, CEBPe, PU1, RA Receptor A

– High resolution ChIP-chip platforms: Affymetrix and NimbleGen

– Data from several different labs in ENCODE consortium

• High likelihood hits for ChIP-chip– 5% false discovery rate

• Supported by chromatin modification data– Modified histones in chromatin: H4Ac, H3Ac, H3K4me, H3K4me2, H3K4me3, etc.

– DNase hypersensitive sites (DHSs) or nucleosome depleted sites

• Result: set of 1369 pTRRs

Functional classes show distinctive trends in phylogenetic depth of

conservation

Genes likely regulated by clade-specific pTRRs are enriched for

distinctive functions

310

450

91

173

Millions ofyears

Percentage of pTRRs that align no further than:Primates: 3%

Eutherians: 71%

Marsupials: 21%

Tetrapods: 4%

Vertebrates: 1%

David King

Enriched GO categories

q-value for FDR

Immune response

Protease inhibition

Mitosis and cell cycleTranscriptional regulation

0.0006

0.0005

0.0005

0.004

0.012Ion transport

Regulatory potential (RP) captures pattern, composition and constraint

in alignments

Genome Research 16:1585 (2006)

• High RP for an aligned sequence means it contains patterns similar to those found in gene regulatory regions– Positive training set: Alignments of known regulatory regions – Negative training set: Alignments of likely neutral DNA

(ancestral repeats)• Human and mouse RP scores are on UCSC Genome Browser and

PSU’s Galaxy

High RP plus conserved consensus motif is a good predictor of CRMs around

GATA-1 regulated genes

Genome Research 16:1480 (2006)

Genes Co-expressed in Late Erythroid Maturation

G1E cells: proerythroblast line lacking the transcription factor GATA-1. G1E-ER cells: rescued by expressing an estrogen-responsive form of GATA-1Rylski et al., Mol Cell Biol. 2003

Predict CRMs based on alignment and expression of

nearby genes• Gene is up- or down-regulated by GATA-1• Noncoding DNA sequence • Aligns between mouse and other mammals and has a positive RP score

• Contains a conserved consensus binding site motif for GATA-1

preCRMs with conserved consensus GATA-1 BS tend to be active on transfected

plasmids

DNA segments with positive RP and a GATA-1 binding motif validate as enhancers at a

good rate

RP consensus motif Tested Validated SuccessPositive conserved 44 23 52%Positive mouse 6 4 67%Negative conserved 6 1 17%Negative none 17 0 0%

Design of ChIP-chip for occupancy by GATA-1

1. Non-overlapping tiling array with 50bp probe and 100bp resolution (NimbleGen)

2. Cover range Mouse chr7:57225996-123812258 (~70Mbp)3. Antibody against the ER portion of

GATA-1-ER protein in rescued G1E-ER4 cells

50 50

100

Yong Cheng (PSU), with Mitch Weiss & Lou Dore (CHoP), Roland Green, Xinmin Zhang(NimbleGen)

Signals in known occupied sites in Hbb LCR

1) Cluster of high signals2) “hill” shape of the signals

HS1 HS2 HS3

ChIP-chip hits are high quality and tend to have GATA-

1 binding motifs• Peak calling by Mpeak (Ren) and Tamalpais

(Beida and Farnham) gave 321 ChIP-chip hits

• 19 hits were tested by qPCR– 13 were validated: ~70%

• 267 out of the 321 (83%) have WGATAR motifs, binding site for GATA-1– Random sampling on average gives 102 DNA

segments with the motif– The ChIP-chip hits are 2.6-fold enriched

for the GATA-1 binding site motif

Only HALF the GATA-1 binding site motifs are conserved

outside rodents• Of the GATA-1 binding motifs in those 249 hits, 112 (45%) are conserved between mouse and at least one non-rodent species.

Distribution of ChIP-chip hits on 70Mb of mouse chr7

Yong Cheng, Yuepin Zhou and Christine Dorman

0

1

2

3

4

GHP181GHP10GHP205GHP7GHP182GHP309

GHP1GHP186GHP204

GHP4GHP314GHP172GHP167GHP74GHP193GHP25GHP27GHP9

GHP170GHP18GHP16GHP243GHP15GHP28GHP17GHP31GHP11GHP198GHP169GHP14GHP173GHP29GHP199GHP12GHP3GHP2GHP24GHP164GHP13GHP30GHP19GHP26GHP161GHP191GHP197GHP183GHP184GHP22GHP6GHP23GHP206GHP194GHP202GHP0GHP200

GHP8GHP185GHP118GHP20 GHN037GHN534GHN006GHN133GHN322GHN478GHN159

YC3

GHN240GHN391GHN419GHN213

Mean fold change

GATA-1 occupied sites by ChIP-chip No GATA-1

21 out of 59 ChIP-chip hits increase activity of HBGpr-Luc in K562 cells.

36% of ChIP-chip hits act as enhancers in K562 cells

14.55.7

0

1

2

3

4

5

GHP7GHP172GHP198GHP10GHP25GHP14GHP15GHP181GHP13GHP186GHP170

GHP1GHP182GHP16GHP169GHP184GHP18GHP9

GHP164GHP24GHP173GHP4

GHP197GHP193GHP167GHP30GHP183GHP185GHP23GHP26GHP2GHP29GHP199GHP28GHP161GHP31GHP191GHP194GHP3GHP12GHP200GHP206

GHP0GHP11GHP27GHP22GHP8GHP118GHP21GHP20 GHN534GHN037GHN391GHN159GHN478GHN240GHN006GHN419GHN133GHN322GHN213

Mean fold change

GATA-1 occupied sites by ChIP-chip No GATA-1

15 out of 50 ChIP-chip hits increase activity of HBGpr-Luc in MEL cells.

30% of ChIP-chip hits act as enhancers in MEL cells

Validated ChIP hit, enhancer, deep conservation

Validated ChIP hit, enhancer, limited conservation

ChIP-chip hit, enhancer, rodent specific

0

1

2

3

4

K562_1 K562_2 MEL

Fold change over parent

Test of neutrality using polymorphism and divergence data

A promoter distal to the beta-like globin genes has a signal for recent

purifying selection

The distal promoter is close to the locus control region for beta-globin

genes

Evolutionary approaches to predicting and analyzing regulatory

regions• Sequence comparison alone will not detect all regulatory

regions– Need comprehensive protein-binding data

• Comparative genomics can help interpret the binding data– Aspects of regulation of some functional groups are clade-

specific– Depth of conservation may correlate with certain types of

function• Strong constraint on basal mechanisms?• Lineage-specific “fine tuning”?

• A majority of sites occupied by GATA-1 in G1E-ER cells have some function other than enhancement (by our assays)

• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).

Many thanks …

B:Yong Cheng, Ross, Yuepin Zhou, David KingF:Ying Zhang, Joel Martin, Christine Dorman, Hao Wang

PSU Database crew: Belinda Giardine, Cathy Riemer, Yi Zhang, Anton Nekrutenko

Alignments, chains, nets, browsers, ideas, …Webb Miller, Jim Kent, David Haussler

RP scores and other bioinformatic input:Francesca Chiaromonte, James Taylor, Shan Yang, Diana Kolbe, Laura Elnitski

Funding from NIDDK, NHGRI, Huck Institutes of Life Sciences at PSU

Categories of Tested DNA Segments

Regulatory potential (RP) to distinguish functional classes

Examples of validated preCRMs

ChIP-chip hits for GATA-1 occupancy

Mpeak TAMALPAIS

275 hits in both 276 hits in both216 6059

321 total ChIP-chip hits

Technical replicates of ChIP-chip with antibody against GATA1-ER

19 ChIP-chip hits were tested by qPCR:13 were validated: ~70%