43

BBSI Research Simulation News

  • Upload
    nora

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

BBSI Research Simulation News. Project proposals. - Monday, June 16. - Format (see News, Presentations and other dates). Renaissance fair and other events. Party at Greg’s house. BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs. - PowerPoint PPT Presentation

Citation preview

Page 1: BBSI Research Simulation News
Page 2: BBSI Research Simulation News

BBSI Research SimulationNews

• Project proposals

- Monday, June 16

- Format (see News, Presentations and other dates)

• Renaissance fair and other events

• Party at Greg’s house

Page 3: BBSI Research Simulation News

BBSI Research SimulationPSSMs and Search for Repeats in DNA

Application of PSSMs

• Regulatory protein and their binding sites• Palindromic DNA and its significance

• How to find protein binding sites: Meme

• PSSMs to find beginning of genes

• Repeated sequences and location of protein binding sites Li et al (2002)

Page 4: BBSI Research Simulation News

Regulatory Protein and their Binding Sites

GTA ..(8).. TAC

5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

lacZ

Crp

RNA Polymerase

Operator

C

Presence of CRP sites Regulation by carbon source

Presence of X sites Regulation by Y

Page 5: BBSI Research Simulation News

Regulatory Protein and their Binding Sites

5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

Page 6: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Page 7: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Page 8: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Page 9: BBSI Research Simulation News

Regulatory Protein and their Binding Sites

Palindromic sequencesTTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Page 10: BBSI Research Simulation News

Regulatory Protein and their Binding Sites

Palindromic sequences

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Page 11: BBSI Research Simulation News

Regulatory P

rotein and their Binding Sites

Palindromic sequences

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Page 12: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

recognizes GTGAGTT

Page 13: BBSI Research Simulation News

Regulatory Protein and their Binding Sites

Palindromic sequencesTTAA

TGTGAGTT

AGCTCACT

CATT

AATTACAC

TCAATCGA

GTGAGTAA

NNNNNNNN

NNNNNNN

NNNNNNNN

NNNNNNN

NNNNNNNN

NNNNN

NNNNNNNN

NNNNN

Page 14: BBSI Research Simulation News

Regul

ator

y Pro

tein

and

their

Bin

ding

Site

s

Palin

drom

ic se

quen

ces

TTAATGTGAGTTAGCTCACTCATT

AATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNN

NNNNNNNNNNNNNNN

NNNNNNNNNNNNN

NNNNNNNNNNNNN

Page 15: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNNNNNNNNNNNNNN

Palindromes: Serve as binding sites for dimeric protein

Page 16: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

tRNA

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

DNA: cruciform

RNA: stem/loop

Page 17: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 18: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATACTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 19: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATA CTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 20: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATA TTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 21: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATATTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 22: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 23: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 24: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 25: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

Function of palindromeRNA secondary structure?

Binding site for dimeric protein?

How to tell?

Page 26: BBSI Research Simulation News

Regulatory Protein and their Binding SitesPalindromic sequences

TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

5’- -3’3’- -5’

TAT GGCATGCTAGC

TTAAT TCATTAATTA AGTAA

CGTACGATCGG TAT

TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

How to tell?Compensatory mutations: RNAUncorrelated mutations: protein

Page 27: BBSI Research Simulation News

Count all in certain class (Li et al, 2000) Guess a pattern and improve (Meme, Gibbs sampler)

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACTnucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGGsnRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTTrp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTCrp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTTribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTTa'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACGb'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCAa'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCCa'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCCb'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA

Human sequences 5’ to transcriptional start

Regulatory Protein and their Binding SitesHow to find them?

Page 28: BBSI Research Simulation News

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

ACAGGGCAGAACCCGGGTGTTTCCGGGGACGCGCCCCCGGGCCTCCGCAGAGCTG

A 0.208 0.292 0.000 0.999 0.000 0.999 0.811 0.905 0.575 0.321 0.151 0.264T 0.160 0.217 0.867 0.000 0.999 0.000 0.189 0.000 0.208 0.057 0.104 0.113C 0.283 0.236 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.330 0.283G 0.349 0.255 0.000 0.000 0.000 0.000 0.000 0.95 0.217 0.472 0.415 0.340

Regulatory Protein and their Binding SitesHow Meme finds them

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Human sequences 5’ to transcriptional start

Page 29: BBSI Research Simulation News

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 5. Move around to find local maximum

Regulatory Protein and their Binding SitesHow Meme finds them

Page 30: BBSI Research Simulation News

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 5. Move around to find local maximum

Regulatory Protein and their Binding SitesHow Meme finds them

Step 6. If probability score high, remember pattern and score

Page 31: BBSI Research Simulation News

How do pattern finders work?

snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTChistone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTTHMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGGTP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTTprotamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT

Step 1. Arbitrarily choose candidate pattern from a sequence

Step 2. Find best matches to pattern in all sequences

Step 3. Construct position-dependent frequency table based on matches

Step 4. Calculate relative probability of matches from frequency table

Step 7. Repeat Steps 1 - 5

Regulatory Protein and their Binding SitesHow Meme finds them

Step 5. Move around to find local maximum

Step 6. If probability score high, remember pattern and score

Page 32: BBSI Research Simulation News

• You’ve found a gene related to Purple Tongue Syndrome

• BlastP: Encoded protein related to cAMP-binding proteins

• Are the similarities trivial? Related to cAMP binding?

• Does your protein contain cAMP-binding site?

• What IS a cAMP-binding site?

Task

1. Determine what is a cAMP-binding site

2. Determine if your protein has one

Regulatory Protein and their Binding SitesHow Meme finds them

Page 33: BBSI Research Simulation News

1. Collect sequences of known cAMP-binding proteins

2. Run Meme, a pattern-finding programAsk it to find any significant motifs

3. Rerun Meme. Demand that every protein has identified motifs

4. Run Pfam over known sequence to check

Do it

Strategy

Regulatory Protein and their Binding SitesHow Meme finds them

Page 34: BBSI Research Simulation News

aceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA

PSSMs in actionIdentification of beginning of gene

Experimentally proven

start sites

unknown

Page 35: BBSI Research Simulation News

PSSMs in actionIdentification of beginning of gene

Experimentally proven

start sites

unknownaceB ACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAGGAGTGAAAAACbioB ACGTTTTGGAGAAGCCCCATGGCTCACglnA ATCCAGGAGAGTTAAAGTATGTCCGCTglnH TAGAAAAAAGGAAATGCTATGAAGTCTlacZ TTCACACAGGAAACAGCTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGGGAAATGGCTCAAsucA GATGCTTAAGGGATCACGATGCAGAACtrpE CAAAATTAGAGAATAACAATGCAAACA

Page 36: BBSI Research Simulation News

aceB ACCACATAACTATGGAGCATCTGCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA

ACGT

PSSMs in actionIdentification of beginning of gene

Page 37: BBSI Research Simulation News

aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACCatpI ACCTCGAAGGGAGCAG.....GAGTGAAAAACbioB ACGTTTTGGAGAAGC...CCCATGGCTCACglnA ATCCAGGAGAGTTA.AAGTATGTCCGCTglnH TAGAAAAAAGGAAATG.....CTATGAAGTCTlacZ TTCACACAGGAAACAG....CTATGACCATGrpsJ AATTGGAGCTCTGGTCTCATGCAGAACserC GCAACGTGGTGAGGG...GAAATGGCTCAAsucA GATGCTTAAGGGATCA....CGATGCAGAACtrpE CAAAATTAGAGAATA...ACAATGCAAACA

ACGT

PSSMs in actionIdentification of beginning of gene

Page 38: BBSI Research Simulation News

PSSMs in actionAlgorithm to find binding sites (Li et al)

Page 39: BBSI Research Simulation News

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Dimer occurred n times. How likely is that?

Frequency of GTGAGTT = f1

Frequency of AACTCAC = f2

How likely is it to find: GTGAGTTAACTCAC

Frequency of joint occurrence = f1 · f2 = f12

Page 40: BBSI Research Simulation News

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

f12 · f12 · f12 · … · (1-f12) · (1-f12) · (1-f12) · …n times N - n times

NCn

N !

n! · (N – n)!·

Page 41: BBSI Research Simulation News

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(f12)n · (1-f12)(N-n)N !

n! · (N – n)!·

Expected number = m = f12 · N

f12 = m / N

(m/N)n · (1-m/N)(N-n)N !

n! · (N – n)!·

Page 42: BBSI Research Simulation News

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(m/N)n · (1-m/N)(N-n)N !

n! · (N – n)!·

(m)n · (1-m/N)NN !

n! · (N – n)!·

(N)n · (1-m/N)n

(m)n · (1-m/N)NN !

n! · (N – n)!·

(N)n · (1 )n

(m)n · e-mN !

n! · (N – n)!·

(N)n · (1 )n

Page 43: BBSI Research Simulation News

Li et al (2002)Algorithm

Calculation of probability by Poisson equation

Probability of n occurrences of dimer =

(m)n · e-mN !

n! · (N – n)!·

(N)n · (1 )n

(m)n · e-mN · (N-1) · (N – 2) · … (N–n+1)

n! (N)n · (1 )n

·

(m)n · e-m

n!