Upload
nora
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
BBSI Research Simulation News. Project proposals. - Monday, June 16. - Format (see News, Presentations and other dates). Renaissance fair and other events. Party at Greg’s house. BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs. - PowerPoint PPT Presentation
Citation preview
BBSI Research SimulationNews
• Project proposals
- Monday, June 16
- Format (see News, Presentations and other dates)
• Renaissance fair and other events
• Party at Greg’s house
BBSI Research SimulationPSSMs and Search for Repeats in DNA
Application of PSSMs
• Regulatory protein and their binding sites• Palindromic DNA and its significance
• How to find protein binding sites: Meme
• PSSMs to find beginning of genes
• Repeated sequences and location of protein binding sites Li et al (2002)
Regulatory Protein and their Binding Sites
GTA ..(8).. TAC
5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN
lacZ
Crp
RNA Polymerase
Operator
C
Presence of CRP sites Regulation by carbon source
Presence of X sites Regulation by Y
Regulatory Protein and their Binding Sites
5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
recognizes GTGAGTT
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
recognizes GTGAGTT
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
Regulatory Protein and their Binding Sites
Palindromic sequencesTTAATGTGAGTTAGCTCACTCATT
AATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNN
NNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
Regulatory Protein and their Binding Sites
Palindromic sequences
TTAATGTGAGTTAGCTCACTCATT
AATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNN
NNNNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
Regulatory P
rotein and their Binding Sites
Palindromic sequences
TTAATGTGAGTTAGCTCACTCATT
AATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNN
NNNNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
recognizes GTGAGTT
Regulatory Protein and their Binding Sites
Palindromic sequencesTTAA
TGTGAGTT
AGCTCACT
CATT
AATTACAC
TCAATCGA
GTGAGTAA
NNNNNNNN
NNNNNNN
NNNNNNNN
NNNNNNN
NNNNNNNN
NNNNN
NNNNNNNN
NNNNN
Regul
ator
y Pro
tein
and
their
Bin
ding
Site
s
Palin
drom
ic se
quen
ces
TTAATGTGAGTTAGCTCACTCATT
AATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNN
NNNNNNNNNNNNNNN
NNNNNNNNNNNNN
NNNNNNNNNNNNN
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNN
Palindromes: Serve as binding sites for dimeric protein
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
tRNA
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
DNA: cruciform
RNA: stem/loop
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATACTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATA CTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATA TTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATATTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTAAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
Function of palindromeRNA secondary structure?
Binding site for dimeric protein?
How to tell?
Regulatory Protein and their Binding SitesPalindromic sequences
TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
5’- -3’3’- -5’
TAT GGCATGCTAGC
TTAAT TCATTAATTA AGTAA
CGTACGATCGG TAT
TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
How to tell?Compensatory mutations: RNAUncorrelated mutations: protein
Count all in certain class (Li et al, 2000) Guess a pattern and improve (Meme, Gibbs sampler)
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACTnucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGGsnRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTTrp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTCrp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTTribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTTa'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACGb'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCAa'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCCa'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCCb'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA
Human sequences 5’ to transcriptional start
Regulatory Protein and their Binding SitesHow to find them?
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
ACAGGGCAGAACCCGGGTGTTTCCGGGGACGCGCCCCCGGGCCTCCGCAGAGCTG
A 0.208 0.292 0.000 0.999 0.000 0.999 0.811 0.905 0.575 0.321 0.151 0.264T 0.160 0.217 0.867 0.000 0.999 0.000 0.189 0.000 0.208 0.057 0.104 0.113C 0.283 0.236 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.330 0.283G 0.349 0.255 0.000 0.000 0.000 0.000 0.000 0.95 0.217 0.472 0.415 0.340
Regulatory Protein and their Binding SitesHow Meme finds them
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Human sequences 5’ to transcriptional start
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
Step 5. Move around to find local maximum
Regulatory Protein and their Binding SitesHow Meme finds them
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
Step 5. Move around to find local maximum
Regulatory Protein and their Binding SitesHow Meme finds them
Step 6. If probability score high, remember pattern and score
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
Step 7. Repeat Steps 1 - 5
Regulatory Protein and their Binding SitesHow Meme finds them
Step 5. Move around to find local maximum
Step 6. If probability score high, remember pattern and score
• You’ve found a gene related to Purple Tongue Syndrome
• BlastP: Encoded protein related to cAMP-binding proteins
• Are the similarities trivial? Related to cAMP binding?
• Does your protein contain cAMP-binding site?
• What IS a cAMP-binding site?
Task
1. Determine what is a cAMP-binding site
2. Determine if your protein has one
Regulatory Protein and their Binding SitesHow Meme finds them
1. Collect sequences of known cAMP-binding proteins
2. Run Meme, a pattern-finding programAsk it to find any significant motifs
3. Rerun Meme. Demand that every protein has identified motifs
4. Run Pfam over known sequence to check
Do it
Strategy
Regulatory Protein and their Binding SitesHow Meme finds them
aceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA
PSSMs in actionIdentification of beginning of gene
Experimentally proven
start sites
unknown
PSSMs in actionIdentification of beginning of gene
Experimentally proven
start sites
unknownaceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA
aceB ACCACATAACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA
ACGT
PSSMs in actionIdentification of beginning of gene
aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA
ACGT
PSSMs in actionIdentification of beginning of gene
PSSMs in actionAlgorithm to find binding sites (Li et al)
Li et al (2002)Algorithm
Calculation of probability by Poisson equation
Dimer occurred n times. How likely is that?
Frequency of GTGAGTT = f1
Frequency of AACTCAC = f2
How likely is it to find: GTGAGTTAACTCAC
Frequency of joint occurrence = f1 · f2 = f12
Li et al (2002)Algorithm
Calculation of probability by Poisson equation
Probability of n occurrences of dimer =
f12 · f12 · f12 · … · (1-f12) · (1-f12) · (1-f12) · …n times N - n times
NCn
N !
n! · (N – n)!·
Li et al (2002)Algorithm
Calculation of probability by Poisson equation
Probability of n occurrences of dimer =
(f12)n · (1-f12)(N-n)N !
n! · (N – n)!·
Expected number = m = f12 · N
f12 = m / N
(m/N)n · (1-m/N)(N-n)N !
n! · (N – n)!·
Li et al (2002)Algorithm
Calculation of probability by Poisson equation
Probability of n occurrences of dimer =
(m/N)n · (1-m/N)(N-n)N !
n! · (N – n)!·
(m)n · (1-m/N)NN !
n! · (N – n)!·
(N)n · (1-m/N)n
(m)n · (1-m/N)NN !
n! · (N – n)!·
(N)n · (1 )n
(m)n · e-mN !
n! · (N – n)!·
(N)n · (1 )n
Li et al (2002)Algorithm
Calculation of probability by Poisson equation
Probability of n occurrences of dimer =
(m)n · e-mN !
n! · (N – n)!·
(N)n · (1 )n
(m)n · e-mN · (N-1) · (N – 2) · … (N–n+1)
n! (N)n · (1 )n
·
(m)n · e-m
n!