Infer function by motifs pp2 introfunc3 - Rostlab · 2014-11-05 · Bairoch A (1991) NAR 19 2241-5...

Preview:

Citation preview

/00© Burkhard Rost

1

title: Protein Prediction 2 (for Bioinformaticians) - Protein function: Infer function by motifsshort title: pp2_introfunc3

lecture: Protein Prediction 2 - Protein function TUM wintersemester

/00© Burkhard Rost

So far: Function introduction • Molecular biology is just at an exciting beginning • We can compute some aspects of molecular life • Most accurate inference of function: based on homology

Today • Motifs • Function by association

NEXT • “compute” enzyme function? • predict localization

2

Past - TOC today - Next

/144© Burkhard Rost

I.2b Function Intro: Sequence motifs

3

/144© Burkhard Rost

Motifs - intro

4

/00© Burkhard Rost

Full sequence (ADH1_human, 95 aa): MANEVIKCKAAVAWEAGKPLSIEEIEVAPPKAHEVRIKIIATAVCHTDAY

TLSGADPEGCFPVILGHEGAGIVESVGEGVTKLKAVWRMQILSKS

Motifs could be:MANEVIKCKAA

Or:MAN[ED]hh[KR]C[KR]

5

Sequence vs motif

/144© Burkhard Rost

6

How can we use this concept 2 search?

?

/144© Burkhard Rost

7

Resources for motifs/patterns

PROSITE:http://us.expasy.org/prosite/ [Hulo et al. Nucl. Acids. Res. 32:D134-D137(2004)]

PRINTS:

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/[Attwood, Briefings in Bioinformatics, 3(3), 252-263 (2002)]

BLOCKS:

http://www.blocks.fhcrc.org/[Henikoff et al., Nucl. Acids Res. 28:228-230 (2000)]

/144© Burkhard Rost

PROSITE

8

/00© Burkhard Rost

1986 starts SWISS-PROT 1988 starts PROSITE 1993 starts ExPasy (with Ron Appel) 1998 SIB: Swiss Institute of Bioinformatics 2009 CALIPHO Computer and Laboratory Investigation of Proteins of Human Origin

9

Amos Bairoch

Amos Bairoch

Shapers and Shakers

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013)

10

Amos BairochShapers and Shakers

Amos Bairoch

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013) • H-index

11

Amos BairochShapers and Shakers

Amos Bairoch

What’s good?

/00© Burkhard Rost

SwissProtProSiteExPasyCaliphoSIB - Swiss Inst Bioinformatics

papers: • >220 papers (Nov 2013) • 4 >1,000 citations (Nov 2013) • 70 over 100 (Nov 2013) • H-index 79 (ISI Nov 2013)

12

Amos BairochShapers and Shakers

Amos Bairoch

/00© Burkhard Rost

Manually align family + annotate motifs Use motifs for automatic alignment and annotation of unknown

13

Motifs and patterns

Search for the motif pattern in a new protein

Find a motif or a pattern in a functionally characterized family

Transfer function annotation

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost

completeness:DB as many motifs as possible high specificity:no false positives at a level at which most are found documentation periodic reviewing

14

PROSITE: Concepts for DB

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993

15

PROSITE history

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993

16

PROSITE history

Search for the motif pattern in a new protein

Find a motif or a pattern in a functionally characterized family

Transfer function annotation

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993Solution:GxxGxxG (membrane)[RK](2)-x-[ST] (phosphorylation)

17

PROSITE history

Search for the motif Find a motif or a pattern in a

Transfer function

/00© Burkhard Rost

completeness:DB as many motifs as possible high specificity:no false positives at a level at which most are found documentation periodic reviewing

18

PROSITE: Concepts for DB

Search for the motif Find a motif or a pattern in a

Transfer function

/00© Burkhard Rost

Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993A Bairoch & P Bucher (1994) NAR 22:3583-9PROSITE: recent developments (profiles) A Bairoch, P Bucher & K Hofmann (1996) NAR 24:189-96repeated 1997, 1999 (Hofmann, Bucher, Falquet, Bairoch)L Falquet, M Pani, P Bucher, N Hulo, CJ Sigrist, K Hofmann, & A Bairoch (2002) NAR 30:235-8

19

PROSITE history

Philip Bucher

/00© Burkhard Rost

CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch, P Bucher (2002) Brief Bioinform 3:265-74 N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher, A Bairoch (2004) NAR 32:D134-7 A Gattiker, E Gasteiger, A Bairoch (2002) Appl Bioinformatics 1:107-8ScanProsite: a reference implementation of a PROSITE scanning tool

20

PROSITE history

/00© Burkhard Rost

A Bairoch (1991) NAR 19 Suppl: 2241-5, prev (1992) NAR 20 Suppl: 2013-8, x (1993) NAR 21: 3097-103, A Bairoch and P Bucher (1994) NAR 22: 3583-9, A Bairoch, P Bucher and K Hofmann (1996) NAR 24: 189-96, prev (1997) NAR 25: 217-21, K Hofmann, P Bucher, L Falquet and A Bairoch (1999) NAR 27: 215-9, L Falquet, M Pagni, P Bucher, N Hulo, CJ Sigrist, K Hofmann and A Bairoch (2002) NAR 30: 235-8, A Gattiker, E Gasteiger and A Bairoch (2002) Appl Bioinformatics 1: 107-8, CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch and P Bucher (2002) Brief Bioinform 3: 265-74, N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher and A Bairoch (2004) NAR 32: D134-7, CJ Sigrist, E De Castro, PS Langendijk-Genevaux, V Le Saux, A Bairoch and N Hulo (2005) Bioinformatics 21: 4060-6, E de Castro, CJ Sigrist, A Gattiker, V Bulliard, PS Langendijk-Genevaux, E Gasteiger, A Bairoch and N Hulo (2006) NAR 34: W362-5, N Hulo, A Bairoch, V Bulliard, L Cerutti, E De Castro, PS Langendijk-Genevaux, M Pagni and CJ Sigrist (2006) NAR 34: D227-30, N Hulo, A Bairoch, V Bulliard, L Cerutti, BA Cuche, E de Castro, C Lachaize, PS Langendijk-Genevaux and CJ Sigrist (2008) NAR 36: D245-9,

21

PROSITE - evolution of method

/144© Burkhard Rost

22

PROSITE / ScanProsite

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

K Hofmann, P Bucher, L Falquet & A Bairoch (1999) Nucl Acids Res 27: 215-9N Hulo et al. (2004) Nucleic Acids Res 32: D134-7

/144© Burkhard Rost

PRINTS

23

/00© Burkhard Rost

University of Manchester (Faculty of Life Sciences & School of Computer Sciences) PRINTS: dignostic fingerprint database TK Attwood & ME Beck (1994) PRINTs-a protein motif fingerprint database

24

Terry K Attwood

Terry K Attwood

/00© Burkhard Rost

Motifs are stretches of evolutionary conserved fingerprints version 42.0 (Manchester Univ, Feb 2012) 2,156 FINGERPRINTS encoding 12,444 single motifs TK Attwood, P Bradley, DR Flower, A Gaulton, N Maudling, A Mitchell, G Moulton, A Nordle, K Paine, P Taylor, A Uddin, C Zygouri (2003) NAR:31, 400-2

25

PRINTS concept

/00© Burkhard Rost

homeoboxThe homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [1-3]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.

26

PRINTS: example

/144© Burkhard Rost

BLOCKS

27

/00© Burkhard Rost

Fred Hutchinson Cancer Center, SeattleHHMI (Howard Hughes Medical Institute) papers:

• >300 papers (Nov 2011) • 3 >1,000 citations (end 2011) • 72 over 100 • H-index 83 (ISI Nov 2011) Paradigm changes

• gene in gene - in intron (1986) • histones NOT only in octamers (2004) • DNA-methylation in histones: H2.AZ in histone spool promotes

gene expression (2008): NOT DNA-methylation shuts off genes (important for cancer drug development)

28

Jorja & Steven HenikoffShapers and Shakers

/00© Burkhard Rost

compile log-odd ratios

BLOSUMn=threshold at n% pairwise sequence identityS Henikoff & Jorja Henikoff (1992) PNAS 89:10915-9

29

BLOSUM

Steven Henikoff

/00© Burkhard Rost

BLOcks of amino acid SUbstitution MatricesAlign only conserved regionsJG Henikoff and S Henikoff (1996) Meth Enzymology 266: 88-104

S Pietrokovski, JG Henikoff & S Henikoff (1996) NAR 24: 197-201

30

BLOSUM

/00© Burkhard Rost

idea taken from multiple alignments

31

BLOCKS

/144© Burkhard Rost

32

BLOCKS: length distribution

J Liu & B Rost (2003) Current Opinion in Chemical Biology 7, 5-11

/144© Burkhard Rost

Pfam

33

/00© Burkhard Rost

classify all proteins and RNA into families to better understand their function and evolution 1997 starts Pfam (Protein families) 2003 Rfam (RNA-families)

Citation giant: • 229 papers (Nov 2011) • 1 with >8,800 citations (Nov 2011) • 6 with >1,000 citations (11/2011) • 32 with > 100 citations (11/2011) • Hirsh index: 48

34

Alex BatemanShapers and Shakers

/00© Burkhard Rost

EL Sonnhammer, SR Eddy, R Durbin (1997) Pfam: a comprehensive database of protein families based on seed alignments. Proteins 28:405-20 EL Sonnhammer, SR Eddy, E Birney, A Bateman, R Durbin (1998) NAR 26:320-2 A Bateman, E Birney, R Durbin, SR Eddy, RD Finn, EL Sonnhammer (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. NAR 27:260-2 SJ Sammut, RD Finn, A Bateman (2008) Pfam 10 years on: 10,000 families and still growing. Brief. Bioinform 9:210-9

35

Pfam: Protein families

/144© Burkhard Rost

36

Pfam: how its done

manual alignment

/00© Burkhard Rost

version/families/

37

Pfam - current stats

/144© Burkhard Rost

38

Pfam-7TM

A Bateman, et al. (2004) Nucleic Acids Res 32: D138-41© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/144© Burkhard Rost

39

Clusters & FamiliesDB/Method Version Latest

UpdateEntries Update URL (all begin with http://)

Short sequence motifsPROSITE 17.23 10/2002 1573 manual www.expasy.ch/prosite/Blocks+ 8/2001 8656 manual blocks.fhcrc.org/blocks/PRINTS 35.0 7/2002 1750 manual www.bioinf.man.ac.uk/dbbrowser/PRINTS/

Structural domain-like regions

Pfam-A 7.6 9/2002 4463 manual pfam.wustl.eduTIGRFAM 2.1 9/2002 1622 manual www.tigr.org/TIGRFAMs/SMART 3.4 10/2002 654 manual smart.embl-heidelberg.deSBASE 9.0 10/2002 483 semi-

manualhydra.icgeb.trieste.it/~kristian/SBASE/

DOMO 2.0 4/1998 automatic www.infobiogen.fr/services/domo/ProDom 2001.3 12/2001 automatic prodes.toulouse.inra.fr/prodom/doc/prodom.htmGeneRAGE automatic www.ebi.ac.uk/research/cgg/services/rage/TribeMCL automatic www.ebi.ac.uk/research/cgg/tribe/CHOP 10/2002 automatic cubic.bioc.columbia.edu/db/chop/

Integration

InterPro 5.2 9/2002 5875 N/A www.ebi.ac.uk/interpro/MetaFam 4.1 9/2002 N/A metafam.ahc.umn.edu

Clusters of proteins

CluSTr automatic www.ebi.ac.uk/clustr/SYSTERS 3.0 automatic systers.molgen.mpg.dePICASSO 0 3/1998 automatic systers.molgen.mpg.deProtoNet 1.4 9/2002 automatic www.protonet.cs.huji.ac.il/protonet/ProClust 1.0 automatic promoter.mi.uni-koeln.de/~proclust/

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

/144© Burkhard Rost

40

Some overlap between databases

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

/144© Burkhard Rost

41

… not everything that shines is copper

J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11

/144© Burkhard Rost

localization motifs

42

/144© Burkhard Rost

motif-based inference of localization

43

/144© Burkhard Rost

Rajesh Nairnow: FDA, Washington

44

Rajesh Nair

/144© Burkhard Rost

45

Similar proteins may differ in localization

R Nair & B Rost (2002) Protein Science 11: 2836-47

/144© Burkhard Rost

46

Shuttle into the nucleus

CYTOPLASM

NUCLEUS

NLS M9

Transportin Importin

Nucleus

Cytoplasm

M Cokol, R Nair & B Rost (2000) EMBO Rep 1: 411-415

/144© Burkhard Rost

47

Types of zip-codes

following: B Alberts, D Bray, J Lewis, M Raff, K Roberts, JD Watson: The Cell, Garland, 1994

/00© Burkhard Rost

ONE in PROSITE bi-partite motif

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

48

How many NLS motifs in databases?

/144© Burkhard Rost

49

Experimental NLS: positive chargesNLS Protein Reference

RKRKK YstDNApolalpha Hsieh et al., 1998RKRRR Amida Irie et al., 2000KKKKRKREK LEF-1 Prieve et al., 1998KKKRRSREK TCF-1 Prieve et al.,. 1998RQARRNRRRRWR HIV-1 Rev Truant et al., 1999RRMKWKK PDX-1 Moede et al., 1999PKKKRKV SV40 LrgT Kalderon et al., 1984PRRRK SRY Sudbeck and Scherer, 1997GKKRSKA H2B Moreland et al., 1987KAKRQR v-Rel Gilmore and Temin, 1988RGRRRRQR Amida Irie et al., 2000PPVKRERTS RanBP3 Welch et al., 1999PYLNKRKGKP Pho4p Welch et al., 1999KRx{7,9}PQPKKKP p53-NLS1 Liang and Clarke, 1999KVTKRKHDNEGSGSKRPK Hum-Ku70 Koike et al., 1999RLKKLKCSKx{19}KTKR GAL4 Chan et al., 1998RKRIREDRKx{18}RKRKR TCPTP Chan et al., 1998RRERx{4}RPRKIPR BDV-P Schwemmle et al., 1999KKKKKEEEGEGKKK act/inh betaA Blauer et al., 1999PRPRKIPR BDV-P Shoya et al., 1998PPRIYPQLPSAPT BDV-P Shoya et al., 1998KDCVINKHHRNRCQYCRLQR TR2 Yu et al., 1998APKRKSGVSKC PolyomaVP1 Chang et al., 1992RKKRRQRRR HIV-1 Tat Truant et al., 1999MPKTRRRPRRSQRKRPPT Rex Palmeri and Malim, 1999KRPMNAFIVWSRDQRRK SRY Sudbeck and Scherer, 1997KRPMNAFMVWAQAARRK SOX9 Sudbeck and Scherer, 1997PPRKKRTVV NS5A Ide et al., 1996YKRPCKRSFIRFI DNAse EBV Liu et al., 1998LKDVRKRKLGPGH DNAse EBV Lyons et al., 1987KRPRP AdenovE1a Bouvier and Baldacci, 1995RRSMKRK hVDR Vihinen-Ranta et al., 1997PAKRARRGYK CPV capsid Kaneko et al., 1997RKCLQAGMNLEARKTKK hGlu.cort. Kaneko et al., 1997RRERNKMAAAKCRNRRR CFOS Kaneko et al., 1997KRMRNRIAASKCRKRKL CJUN Kaneko et al., 1997

/144© Burkhard Rost

50

Experimental NLS: more complicated

NLS Protein Reference

CYGSKNTGAKKRKIDDA DNAhelicaseQ1 Miyamoto et al., 1997

[AKR]TPIQKHWRPTVLTEGPPVKIRIETGEWE[KA] ASVintegrase Kukolj G. 1998

GGGx{3}KNRRx{6}RGGRN Nab2 Truant et al., 1998

KRxxxxxxxxxKTKK THOV NP Weber et al., 1998

EYLSRKGKLEL VirD2-Nterm Tinland et al., 1992KRPACTLKPECVQQLLVCSQEAKK HCDA Somasekaram et al., 1999

RVHPYQR QKI-5 Wu et al., 1999HARNT Eguchi et al., 1997YNNQSSNFGPMKGGN M9 Bonifaci et al., 1997

SxGTKRSYxxM InfluenzaNP Wang et al., 1997TKRSxxxM InfluenzaNP Wang et al., 1997VNEAFETLKRC MyoD Vandromme et al., 1995

MNKIPIKDLLNPG Mat-alpha Hall et al., 1984

/144© Burkhard Rost

51

In silico mutagenisis

/144© Burkhard Rost

52

Increasing accuracy and coverage

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

/144© Burkhard Rost

53

Increasing accuracy and coverage

Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E

Coverage F

PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %

NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %

Coverage

/144© Burkhard Rost

54

Types of zip-codes

/144© Burkhard Rost

Sarah Gilman

55

Kaz Wrzeszczynski

/00© Burkhard Rost

56

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/00© Burkhard Rost

57

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/00© Burkhard Rost

58

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/00© Burkhard Rost

59

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/00© Burkhard Rost

60

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/144© Burkhard Rost

61

ER

&Sequence motif 1 ER/Golgi Non-ER/Golgi

N % N %Endoplasmic reticulum (ER) motifs 2

KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98

Golgi apparatus motifs 3

YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97

C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11

KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53

/00© Burkhard Rost

Automate

Unify

Remote homologues

62

Open challenges - motifs and patterns

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost

Identify active site / functional element

Search for this structural pattern in a new protein

Transfer function annotation

S Jones & J Thornton (2004) Curr Opin Struc Biol 8:3-7

Manual identification of active site Automatic structural alignment?

63

Structural motifs

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost

Find

Search

Add biophysics of the site to the spatial search

64

Open challenges - structural motifs

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/144© Burkhard Rost

Example 3: Voltage-gated

potassium channel

65

/144© Burkhard Rost

66

Example: Voltage-gated potassium channel

V Ruta et al. & R MacKinnon (2003) Nature, 422:180-5

• Eukaryotic voltage-gated potassium channel (VG-K+) • Prokaryotic membrane proteins are easier to crystallize than eukaryotic ones

• find a prokaryotic VG-K+ having functional and structural features similar to the eukaryotic one

© Marco Punta

/144© Burkhard Rost

67

Voltage-gated K+ channel: sequence

1MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY

GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD

TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY

ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS

IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA

CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI

LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG

SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN

FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG

LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ

QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV

The template: voltage gated potassium channel from Shaker

© Marco Punta

/00© Burkhard Rost

68

Why called shaker?

??

???

/00© Burkhard Rost

69

Why called shaker?

© Wikipedia

The shaker (Sh) gene, when mutated, causes a variety of atypical behaviors in the fruit fly .. Under ether anesthesia, the fly’s legs will shake … , it will exhibit aberrant movements. Sh-mutant flies have a shorter lifespan than regular flies; in their larvae, the repetitive firing of action potentials as well as prolonged exposure to neurotransmitters at neuromuscular junctions occurs.

/144© Burkhard Rost

70

Voltage-gated K+ channel: search

PSI-BLAST: http://www.ncbi.nih.gov/BLAST/ © Marco Punta

/144© Burkhard Rost

71

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

© Marco Punta

~ 30% PIDE over 80 aligned residues: enough?

/144© Burkhard Rost

72

Voltage-gated K+ channel: filter

© Marco Punta

/144© Burkhard Rost

73

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

© Marco Punta

~ 30% PIDE over 80 aligned residues: not quite enough to infer similarity in structure

/144© Burkhard Rost

74

Voltage-gated K+ channel: alignment

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

the alignment

Target :

295

1

the entire sequence of the identified protein

MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT

MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE

GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA

IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL

IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK

© Marco Punta

/144© Burkhard Rost

75

Voltage-gated K+ channel: function?Shaker channel

• Membrane protein?

© Marco Punta

/144© Burkhard Rost

76

Voltage-gated K+ channel:

Out

In

α-bundle β-barrel

© Marco Punta

/144© Burkhard Rost

77

Voltage-gated K+ channel: TMH predicted

Side View single subunit

Top View tetramer

© Marco Punta

/144© Burkhard Rost

78

Voltage-gated K+ channel: TMH predicted

1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY

GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD

TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY

ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS

IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA

CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI

LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG

SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN

FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG

LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ

QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV

S1

S2

S3

S4 S5

P S6

© Marco Punta

/144© Burkhard Rost

79

Voltage-gated K+ channel: TMHs predicted

MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT

MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE

GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA

IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL

IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK

S1

S2 S3

S4 S5

P S6

TMHs predictions on the target sequence

© Marco Punta

/144© Burkhard Rost

80

Voltage-gated K+ channel: function of template

Shaker channel

• Membrane protein

• K+ selectivity?

© Marco Punta

/144© Burkhard Rost

81

Voltage-gated K+ channel:

Out

In + -

-

++ -

-

+

© Marco Punta

/144© Burkhard Rost

82

Voltage-gated K+ channel: conservation of outer pore

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

the selectivity filter

S5 S6

P

S4S3S2S1T

Gx

xG

x xT

© Marco Punta

/144© Burkhard Rost

83

Voltage-gated K+ channel: functional characterization of target

Shaker channel

• Membrane protein

• K+ selectivity

© Marco Punta

/144© Burkhard Rost

84

Voltage-gated K+ channel: functional characterization of target

Shaker channel

• Membrane protein

• K+ selectivity

• Voltage gating

© Marco Punta

/144© Burkhard Rost

85

Voltage-gated K+ channel:

Out

In

Out

© Marco Punta

closed

/144© Burkhard Rost

86

Voltage-gated K+ channel:

Out

In

+

-

Out

© Marco Punta

open

/144© Burkhard Rost

87

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S4S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Sbjct : 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Sbjct : 210 LIGTVSNMF 218

P S6

the gating hinge

© Marco Punta

/144© Burkhard Rost

88

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

+

+++

S4

voltage sensor

© Marco Punta

/144© Burkhard Rost

89

Voltage-gated K+ channel: Conservation of functional residues in target

S5 S6

P

S3S2S1

Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + L Target: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209

Shaker: 473 PVPVIVSNF 481 + + + F Target: 210 LIGTVSNMF 218

P S6

S4

other voltage sensing residues

© Marco Punta

/144© Burkhard Rost

90

Voltage-gated K+ channel: Function of target

Shaker channel

• Membrane protein

• K+ selectivity

• Voltage gating

© Marco Punta

/00© Burkhard Rost

91

Roderick MacKinnon’s Nobel Prize

© Wikipedia

Roderick MacKinnon (Rockefeller Univ New York)

Nobel Prize Chemistry 2003:“for structural and

mechanistic studies of ion channels”

© Nobel Prize Foundation

potassium sodiumDA Doyle, J Morais Cabral, RA Pfuetzner, A Quo, JM Gulbis, SL Cohen, BT Chait and R MacKinnon. The structure of the potassium channel: Molecular basis of K+ conduction and selectivity. Science 280 (1998) 69-77.

JH Morais-Cabral, Y Zhou and R MacKinnon. Energetic optimization of ion conduction rate by the K+ selectivity filter. Nature 414 (2001) 37-47.

Y Jiang, A Lee, J Chen, M Cadene, BT Chait and R MacKinnon (2002). Crystal structure and mechanism of a calcium-gated potassium channel. Nature 417, 515-522.

/144© Burkhard Rost

I.2c Function Intro: Function by association

92

/00© Burkhard Rost

93

Co-expression

Expression data Machine Learning / Clustering Functional classes

For example: P Brown et al. (2000) PNAS 97:262-267© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/144© Burkhard Rost

94

Interactions / networks

For example: AH Tong et al. (2002) Science 295: 321-324© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost A Bairoch (2000) Nucleic Acid Res 28:304-305

Differentiate functional and physical interaction

Improve accuracy and coverage (data, algorithm)

Ab initio/de novo prediction

95

Open challenges - function by association

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost

Sub-cellular localization (nucleus, membrane,

etc.)

Post-translational modifications

Functionally important residues

Interaction sites

96

Predict aspects of function

© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)

/00© Burkhard Rost

Function introduction • Molecular biology is just at an exciting beginning • We can compute some aspects of molecular life • Most accurate inference of function: based on homology • Homology-based inference of function can be improved by

motifsproblem: definition of motifs still not fully automated

NEXT • Computing chemistry - enzyme function • Prediction of subcellular localization

97

Conclusions today

Recommended