18
Exploiting Exploiting transcription factor transcription factor binding site clustering binding site clustering to identify cis- to identify cis- regulatory modules regulatory modules involved in pattern involved in pattern formation in the formation in the Drosophila Drosophila genome genome ECS289A Presentation ECS289A Presentation By Hua Chen By Hua Chen 2003-3-3 2003-3-3

ECS289A Presentation By Hua Chen 2003-3-3

Embed Size (px)

DESCRIPTION

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. ECS289A Presentation By Hua Chen 2003-3-3. Background Knowledge. - PowerPoint PPT Presentation

Citation preview

Page 1: ECS289A Presentation By Hua Chen 2003-3-3

Exploiting transcription Exploiting transcription factor binding site factor binding site

clustering to identify cis-clustering to identify cis-regulatory modules regulatory modules involved in pattern involved in pattern

formation in the formation in the DrosophilaDrosophila genome genome

ECS289A PresentationECS289A PresentationBy Hua ChenBy Hua Chen

2003-3-32003-3-3

Page 2: ECS289A Presentation By Hua Chen 2003-3-3

Background KnowledgeBackground Knowledge A significant character of cis-regulatory sites: A significant character of cis-regulatory sites:

the multiple binding sites for different the multiple binding sites for different transcriptional factors tend to cluster together transcriptional factors tend to cluster together in one region around the gene, forming the Cis-in one region around the gene, forming the Cis-Regulatory Modules (CRM).Regulatory Modules (CRM).

The searching of cis-regulatory sites gives out The searching of cis-regulatory sites gives out too many candidate positions, which make it too many candidate positions, which make it difficult to tell the true ones;difficult to tell the true ones;

The character of CRM provides a feasible The character of CRM provides a feasible method to identify the cis-regulatory sites in the method to identify the cis-regulatory sites in the genome.genome.

Page 3: ECS289A Presentation By Hua Chen 2003-3-3

One example of CRM in One example of CRM in Drosophila:Drosophila:eveeve gene gene

Page 4: ECS289A Presentation By Hua Chen 2003-3-3

Targets:Targets: Adopt the clustering Adopt the clustering

of cis-regulatory of cis-regulatory modules as a method modules as a method to identify the to identify the functional motifs;functional motifs;

Test the method with Test the method with some known real some known real CRM regions;CRM regions;

Search the genome to Search the genome to discover CRMs and discover CRMs and confirm the results by confirm the results by experiments.experiments.

The System The System Investigated:Investigated:

The early The early DrosophilaDrosophila embryo.embryo.

Five Five transcriptional transcriptional factors: Bcd, factors: Bcd, Cad, Hb, Kr and Cad, Hb, Kr and Kni are Kni are investigated.investigated.

Page 5: ECS289A Presentation By Hua Chen 2003-3-3

Methods:Methods: Collecting Transcription Factor Binding Sequences Collecting Transcription Factor Binding Sequences

in preceding lab works and doing Alignment;in preceding lab works and doing Alignment; Construction of Position Weight Matrices (PWM) for Construction of Position Weight Matrices (PWM) for

the conserved motifs.the conserved motifs. Test the method with the known CRMs;Test the method with the known CRMs; Genome-wide Searching for unknown regulatory Genome-wide Searching for unknown regulatory

regions;regions; mRNA Hybridization and Microarray hybridization mRNA Hybridization and Microarray hybridization

to test whether the predicted regions are near to to test whether the predicted regions are near to genes under regulation of the Transcription genes under regulation of the Transcription Factors;Factors;

One special case: One special case: giantgiant gene, further investigated gene, further investigated by Transgenics and Mutant Embryo.by Transgenics and Mutant Embryo.

Page 6: ECS289A Presentation By Hua Chen 2003-3-3

Step1: Collection and Step1: Collection and Alignment of TF Binding Alignment of TF Binding SitesSites Bcd, Cad, Hb, Kr, Kni binding Bcd, Cad, Hb, Kr, Kni binding

sequences are determined by in vitro sequences are determined by in vitro DNAse protection assays;DNAse protection assays;

The sequences are aligned with The sequences are aligned with MEME.MEME.

Page 7: ECS289A Presentation By Hua Chen 2003-3-3
Page 8: ECS289A Presentation By Hua Chen 2003-3-3

Step 2: Construction of Step 2: Construction of PWMs and Searching:PWMs and Searching: Patser is used to construct the Position Weight Patser is used to construct the Position Weight

Matrix;Matrix; Cis-Analyst is used to identify the potential Cis-Analyst is used to identify the potential

binding sites matching to the PWM in the binding sites matching to the PWM in the Drosophila genome.Drosophila genome. A user-defined cutoff parameter (site_p) to eliminate A user-defined cutoff parameter (site_p) to eliminate

predicted low-affinity sites;predicted low-affinity sites; Search the sequence with a specified window length;Search the sequence with a specified window length; Retain the windows that contain at least min_sites Retain the windows that contain at least min_sites

binding sites;binding sites; Merge all overlapping windows into a “cluster”.Merge all overlapping windows into a “cluster”.

Page 9: ECS289A Presentation By Hua Chen 2003-3-3

Binding Site Sequence Binding Site Sequence for for CadCad::

Page 10: ECS289A Presentation By Hua Chen 2003-3-3

Binding Sites:Binding Sites:

Page 12: ECS289A Presentation By Hua Chen 2003-3-3

Step 3: Collection of Known Step 3: Collection of Known CRMs:CRMs:

Page 13: ECS289A Presentation By Hua Chen 2003-3-3

Successful Successful Result: Result: 14/1914/19

with the with the searching searching criteria: criteria: window-window-size=700 bp, size=700 bp, number of number of predicted predicted sites>=13sites>=13

Page 14: ECS289A Presentation By Hua Chen 2003-3-3

Step 4: Genome-wide Step 4: Genome-wide Searching:Searching: 28 clusters identified;28 clusters identified; 23 out of 28 fall in regions between 23 out of 28 fall in regions between

genes;genes; 5 in the intron regions;5 in the intron regions; 49 genes in the nearby regions.49 genes in the nearby regions.

Page 15: ECS289A Presentation By Hua Chen 2003-3-3

Step 5: Examine the expression pattern Step 5: Examine the expression pattern of the 49 genes by RNA in situ of the 49 genes by RNA in situ hybridization and microarray hybridization and microarray hybridization:hybridization: The 49 genes are The 49 genes are

examined by examined by hybridizations to see hybridizations to see whether they show the whether they show the pattern of under pattern of under regulation of the TFs;regulation of the TFs;

10 out of the 28 clusters 10 out of the 28 clusters are near to at least one are near to at least one gene show the anterior-gene show the anterior-posterior expression posterior expression pattern (Under regulation pattern (Under regulation of the five TFs).of the five TFs).

Page 16: ECS289A Presentation By Hua Chen 2003-3-3

Step 6: The special case: Step 6: The special case: giantgiant gene gene

The posterior expression The posterior expression is regulated by is regulated by Cad,Hb,Kr;Cad,Hb,Kr;

The cis-regulatory sites The cis-regulatory sites are still unknown;are still unknown;

The predicted CRM The predicted CRM nearest to the nearest to the giantgiant gene gene is cloned to the upstream is cloned to the upstream of lacZ reporter gene.of lacZ reporter gene.

The lacZ gene show a The lacZ gene show a similar expression similar expression pattern as the giant pattern as the giant mRNA.mRNA.

+/+ +/+ Kr/KrKr/Kr

Page 17: ECS289A Presentation By Hua Chen 2003-3-3

Conclusions:Conclusions: Binding site clustering is an effective Binding site clustering is an effective

method to identify cis-regulatory method to identify cis-regulatory modules;modules;

A major block is the paucity of the A major block is the paucity of the binding data for most transcription binding data for most transcription factors, which need a systematical factors, which need a systematical work;work;

The real CRM structures is more The real CRM structures is more complex, it needs to incorporate complex, it needs to incorporate more complex rules in the method. more complex rules in the method.

Page 18: ECS289A Presentation By Hua Chen 2003-3-3

ReferenceReference Berman, B.P., Nibu, Y. et al. 2001. Berman, B.P., Nibu, Y. et al. 2001.

Exploiting transcription factor Exploiting transcription factor binding site clustering to identify binding site clustering to identify cis-regulatory modules involved in cis-regulatory modules involved in pattern formation in the pattern formation in the DrosophilaDrosophila genome. genome. P. N. A. SP. N. A. S. 99:757-762. 99:757-762