57
The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz [email protected] of PROTEIN SEQUENCE

The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz [email protected] of PROTEIN SEQUENCE

Embed Size (px)

Citation preview

Page 1: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

The FREAKS

Session 3.1: Repeats Session 3.2: Biased regions

Miguel AndradeJohannes-Gutenberg University of Mainz

[email protected]

of PROTEIN SEQUENCE

Page 2: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Frequency

14% proteins contains repeats (Marcotte et al, 1999)

1: Single amino acid repeats.

2: Longer imperfect tandem repeats. Assemble in structure.

Page 3: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition repeatsSequence, long, imperfect, tandem

MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASFGSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

Page 4: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition repeatsSequence, long, imperfect, tandem

MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASFGSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

Page 5: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition repeatsSequence, long, imperfect, tandem

MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN

Page 6: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition repeatsSequence, long, imperfect, tandem

MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN

Page 7: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 8: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 9: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 10: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 11: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 12: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Tandem repeats fold together

Page 13: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition repeatsSequence, long, imperfect, tandem

MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN

Page 14: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

(Vlassi et al, 2013)

http://weblogo.berkeley.edu

Page 15: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Andrade et al. (2001) J Struct Biol

Page 16: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Definition CBRs

Perfect repeat: QQQQQQQQQQQImperfect: QQQQPQQQQQQAmino acid type: DDDDDEEEDEDEED

Compositionally biased regions (CBRs)

High frequency of one or two amino acids in a region.

Particular case of low complexity region

Page 17: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 18: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 19: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 20: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 21: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection repeatsSometimes straightforward. N-terminal human Huntingtin. How many repeats can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 22: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?

>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Page 23: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?

EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSTQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLPSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

Page 24: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?

EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSTQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLPSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

Page 25: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

The FREAKS

Session 3.1: Repeats Session 3.2: Biased regions

Miguel AndradeJohannes-Gutenberg University of Mainz

[email protected]

of PROTEIN SEQUENCE

Page 26: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Frequency repeatsFraction of proteins annotated with the keyword REPEAT in SwissProt

%Archaea 27/3428 0.79Viruses 81/8048 1.00Bacteria 299/28438 1.05Fungi 232/8334 2.78Viridiplantae 153/6963 2.20Metazoa 1538/28948 5.31Rest of Eukaryota 92/2434 3.78

(Andrade et al 2001)

Page 27: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

Comparing a sequence against itself

Page 28: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS

NMTSSVCSPANISV

Page 29: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS

NMTSSVCSPANISV | 1 match

Page 30: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS

NMTSSVCSPANISV ||| ||||| 8 matches

Page 31: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS

NMTSSVCSPANISV | | 2 matches

Page 32: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS

NMTSSVCSPANISV | 1 match

Page 33: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS NMTSSVCSPANISV

8

Page 34: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeats

Dotplots

TLRSSVSSPANINNS NMTSSVCSPANISV

1821

Page 35: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Exercise 1

Page 36: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

• Obtain the sequence from UniProt

• Go to the Dotlet web page

• Click on the input button and paste the sequence there

• Try to find combinations of parameters that show patterns in the dot plot

• Find repetitions clicking in the diagonal patterns

Exercise 1. Using Dotlet with the human mineralocorticoid receptor (MR)

Page 37: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Page 38: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Page 39: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Page 40: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by A

Page 41: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by AWhich one is not matched?LPTA, SPAA, LPPA, LPAP, SPLA

Page 42: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns

JalView with Regular Expression searches

Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by AWhich one is not matched?LPTA, SPAA, LPPA, LPAP, SPLA

Page 43: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

• Run JalView using the JNLP file in desktop (from http://www.jalview.org/Download)

• Load this MSA in JalView

• Use the "find" option with a regular expression and mark all matches

• Try to find the expression that matches more repeats. How many repeats do you see? How long are they? Would you correct the alignment based on these findings?

Exercise 2. Using JalView with a MSA of the MR with orthologs

Page 44: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

#T1

#T13#T12#T11#T10#T9#T8

#T7#T2 #T3 #T4 #T5 #T6

#F1 #F2 #F3 #F4

#F5 #F10#F9#F8#F7#F6

#T14 #T15

#F11

** *

* * **

(Vlassi et al, 2013)

Page 45: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Andrade and Bork (1995) Nature Genetics

Page 46: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

A subunit PP2A structure

PDB:1b3uGroves et al. (1999) Cell

Page 47: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE
Page 48: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Ap1 Clathrin Adaptor Core

PDB:1w63Heldwein et al. (2004) PNAS

Page 49: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Ap1 Clathrin Adaptor Core

PDB:1w63Heldwein et al. (2004) PNAS

Page 50: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Neural Network!Secondary structureTransmembrane helicesResidue exposure

Andrade, Petosa, O'Donoghue, Müller and Bork (2001) J Mol Biol

Page 51: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Neural Network

Backpropagation neural network

…GTAARTWCGASLFVPRLLAGHVDITSLVRALAKSGDLFVARSTKT…

Central position

Output neuron = 0.1 - 0.9

Hidden layer n=3

Input layer n=39x20

Architecture

Palidwor et al. (2009) PLoS Comp Biol

Page 52: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Neural NetworkArchitecture

H1 H2

L

H1 H2L

Fournier et al. (2013) PLoS One

Page 53: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

Virus/phages 1379599 16 1.16E-05

Archaea 362208 296 8.17E-04Euryarchaeota 225118 247 1.10E-03

Crenarchaeota 100611 32 3.18E-04

Bacteria 14505441 3939 2.72E-04

Acidobacteria 40456 27 6.67E-04

Actinobacteria 1634898 462 2.83E-04

Bacteroidetes 724491 135 1.86E-04

Chlamydiae 103375 155 1.50E-03

Chloroflexi 57584 74 1.29E-03

Cyanobacteria 279184 549 1.97E-03

Firmicutes 3837822 627 1.63E-04

Planctomycetes 56777 129 2.27E-03

Proteobacteria 7220418 1599 2.21E-04

Spirochetes 135404 100 7.39E-04

Eukaryota 5710673 14659 2.57E-03

Apicomplexa 129180 237 1.83E-03

Ciliophora 66444 128 1.93E-03

Bacillariophyta 24672 74 3.00E-03

Viridiplantae 1577040 4086 2.59E-03

Chlorophyta 82919 321 3.87E-03

Streptophyta 930229 1463 1.57E-03

Kinetoplastida 119358 385 3.23E-03

Mycetozoa 34276 103 3.01E-03

Phaeophyceae 21376 63 2.95E-03

Fungi 1256144 4228 3.37E-03

Metazoa 2796864 6873 2.46E-03

Nematodes 223421 365 1.63E-03

Trematodes 39558 97 2.45E-03

Insecta 694809 1173 1.69E-03

Sarcopterygii 1145548 3909 3.41E-03

0

4E-03

Fournier et al. (2013) PLoS One

Page 54: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

http://cbdm.mdc-berlin.de/~ard2/

Page 55: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

• Go to the ARD2 web page

• Paste this sequence (1-780 fragment of human huntingtin) in the input window

• Run ARD2 and interpret the output

Exercise 3. Detecting repeats in human huntingtin

Page 56: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

• Go to the PDBPaint web page

• In the "Query PDB" window type "2IE3", in the "web service" menu choose the "ARD2" option and select a large window size (e.g. 800*800)

• Hit the "Go!" button

• Turn around the structure and examine the correspondence between the hits and the structure

Exercise 4. Viewing detected repeats in a protein structure

Page 57: The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE