25
The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Embed Size (px)

Citation preview

Page 1: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The PSI KB Protein Model Portal

Torsten Schwede

NIGMS PSI „Bottlenecks“ Workshop

Bethesda, April 14, 2008

Swiss Institute of Bioinformatics

Page 2: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Overview: The KB Modeling Portal

Introduction

The KB Protein Model Portal Mission and Goals

Version 1.0: Content, Features & Technical

Implementation

Outlook: Next steps New Features & Functions

Modeling Portal Community Workshop

Questions & Discussion

Page 3: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics
Page 4: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaaatctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgcgagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatactgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatcagaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaaccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagagagcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgatacagtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatctgttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaatttcaaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcagcatgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaattatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatacaaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaagcacaaccagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagataaagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagattgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcaccagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcagtattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccactagggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggtcaaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgccatattaggacatatagttagccctaggtgtgaatatcaagcaggacataacaaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaagggaaccatacaatgaatggacactagaacttttagaggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaacatagcagaatagacattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcaggaagaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaatataggaaaataagaagacaaaacaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaatttgtgggtcacagtttattatggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaaattttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtttatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccactagtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagataaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagtcattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactcaactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaattaattgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggcagagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaacaatagtctttaatcaatcctcaggaggggacccagaaattgtaatgcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattataaacatgtggcaggaagtaggaaaagcaatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatataaagtagtaaaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtctggtatagtgcaacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcatttgcaccactgctgtgccttggaatactagttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattggaattagataactgggcaagtttgtggaattggtttagcataacaaattggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagacgcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagagagacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttgattgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggattcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggt

Page 5: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

100

1'000

10'000

100'000

1'000'000

10'000'000

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

TrEMBL

SwissProt

PDB

Public Database Content

(Sources: PDB, EBI, SIB)

Page 6: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

How can we find out if there is a

model available for a given protein

sequence?

Well ......

Overview: The KB Modeling Portal

Page 7: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The goal of the KB Protein Model Portal is to give access to the all models that can be leveraged from PSI targets and other experimental protein structures.

The Protein Model Portal aims to provide a single interface to query simultaneously the existing pre-computed models at various sites, gives access to interactive services for template selection, target-template alignment, model building, and quality assessment.

The KB Protein Model Portal

Page 8: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The KB Protein Model Portal does NOT: build models or develop modeling methods,

but provides an interface to the participating expert groups and services;

store all models in a database, but provides a query interface to the models provided by the partner sites;

judge the quality of the models provided, but provides an interface to services for structure comparison and evaluation;

The KB Protein Model Portal

Page 9: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The KB Protein Model Portal

How can we organize protein model data from different sources in a single portal?

Data view of the PDB is „experiment centric“, i.e. unique PDB IDs are assigned to structures that result from of a specific experiment.

Data view of models is “sequence centric”, i.e. one or more models are build for a segment of a specific protein sequence (“target”) based on one or more experimental structures (“templates”).

Typical example:

Target Sequence

Models

Page 10: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Bottlenecks:

There is no common registry scheme for protein models.

For each target protein sequence, an ensemble of models

will be available based on different templates, alternative

alignments, and modeling methods.

Models will generally cover only fractions of the target

sequence; parts of the target sequence might be missing,

e.g. non modeled loops.

Protein models must update frequently to reflect updates

of target sequence databases (UniProt), template structure

databases (PDB), and algorithmic improvements.

The KB Protein Model Portal

Page 11: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Bottlenecks:

Sequence database accession codes are neither unique, nor

stable.

Some groups do not store pre-computed models, but calculate

models “on the fly”.

Some models will be based on outdated target sequences.

Target protein sequences will be uniquely identified by hash

values (UTSI) of their full length sequences as reference

space.

The KB Protein Model Portal

Page 12: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Data mapping of target sequences and structure modelsUnique (full length) target protein sequences are used as reference space to group models for identical targets. Target protein sequences will be uniquely identified by Hash function values (UTSI).

>P68399|CSK21_BOVIN Casein kinase II subunit alpha - Bos taurusMSGPVPSRARVYTDVNTHRPREYWDYESHVVEWGNQDDYQLVRKLGRGKYSEVFEAINITNNEKVVVKILKPVKKKKIKREIKILENLRGGPNIITLADIVKDPVSRTPALVFEHVNNTDFKQLYQTLTDYDIRFYMYEILKALDYCHSMGIMHRDVKPHNVMIDHEHRKLRLIDWGLAEFYHPGQEYNVRVASRYFKGPELLVDYQMYDYSLDMWSLGCMLASMIFRKEPFFHGHDNYDQLVRIAKVLGTEDLYDYIDKYNIELDPRFNDILGRHSRKRWERFVHSENQHLVSPEALDFLDKLLRYDHQSRLTAREAMEHPYFYTVVKDQARMGSSSMPGGSTPVSSANMMSGISSVPTPSPLGPLAGSPVIAAANPLGMPVPAAAGAQQ>P68400|CSK21_HUMAN Casein kinase II subunit alpha - Homo sapiensMSGPVPSRARVYTDVNTHRPREYWDYESHVVEWGNQDDYQLVRKLGRGKYSEVFEAINITNNEKVVVKILKPVKKKKIKREIKILENLRGGPNIITLADIVKDPVSRTPALVFEHVNNTDFKQLYQTLTDYDIRFYMYEILKALDYCHSMGIMHRDVKPHNVMIDHEHRKLRLIDWGLAEFYHPGQEYNVRVASRYFKGPELLVDYQMYDYSLDMWSLGCMLASMIFRKEPFFHGHDNYDQLVRIAKVLGTEDLYDYIDKYNIELDPRFNDILGRHSRKRWERFVHSENQHLVSPEALDFLDKLLRYDHQSRLTAREAMEHPYFYTVVKDQARMGSSSMPGGSTPVSSANMMSGISSVPTPSPLGPLAGSPVIAAANPLGMPVPAAAGAQQ>Q5U065|Q5U065_HUMAN Casein kinase 2, alpha 1 polypeptide - Homo sapiensMSGPVPSRARVYTDVNTHRPREYWDYESHVVEWGNQDDYQLVRKLGRGKYSEVFEAINITNNEKVVVKILKPVKKKKIKREIKILENLRGGPNIITLADIVKDPVSRTPALVFEHVNNTDFKQLYQTLTDYDIRFYMYEILKALDYCHSMGIMHRDVKPHNVMIDHEHRKLRLIDWGLAEFYHPGQEYNVRVASRYFKGPELLVDYQMYDYSLDMWSLGCMLASMIFRKEPFFHGHDNYDQLVRIAKVLGTEDLYDYIDKYNIELDPRFNDILGRHSRKRWERFVHSENQHLVSPEALDFLDKLLRYDHQSRLTAREAMEHPYFYTVVKDQARMGSSSMPGGSTPVSSANMMSGISSVPTPSPLGPLAGSPVIAAANPLGMPVPAAAGAQQ>Q9D9I4|TBC20_MOUSE TBC1 domain family member 20 - Mus musculusMALRPSKGDGSAGRWDRGAGKADFNAKRKKKVAEIHQALNSDPIDLAALRRMAISEGGLLTDEIRCQVWPKLLNVNTSEPPPVSRKDLRDMSKDYQQVLLDVRRSLRRFPPGMPDEQREGLQEELIDIILLVLDRNPQLHYYQGYHDIVVTFLLVVGERLATSLVEKLSTHHLRDFMDPTMDNTKHILNYLMPIIDQVSPELHDFMQSAEVGTIFALSWLITWFGHVLMDFRHVVRLYDFFLACHPLMPIYFAAVIVLYREQEVLDCDCDMASVHHLLSQIPQDLPYETLISRAGDLFVQFPPSELAREAAAQQEAERTAASTFKDFELASTQQRPDMVLRQRFRGLLRPEARTKDVLTKPRTNRFVKLAVMGLTVALGAAALAVVKSALEWAPKFQLQLFP>ipi|IPI00707334|IPI00707334.1 CASEIN KINASE II SUBUNIT ALPHA.MSGPVPSRARVYTDVNTHRPREYWDYESHVVEWGNQDDYQLVRKLGRGKYSEVFEAINITNNEKVVVKILKPVKKKKIKREIKILENLRGGPNIITLADIVKDPVSRTPALVFEHVNNTDFKQLYQTLTDYDIRFYMYEILKALDYCHSMGIMHRDVKPHNVMIDHEHRKLRLIDWGLAEFYHPGQEYNVRVASRYFKGPELLVDYQMYDYSLDMWSLGCMLASMIFRKEPFFHGHDNYDQLVRIAKVLGTEDLYDYIDKYNIELDPRFNDILGRHSRKRWERFVHSENQHLVSPEALDFLDKLLRYDHQSRLTAREAMEHPYFYTVVKDQARMGSSSMPGGSTPVSSANMMSGISSVPTPSPLGPLAGSPVIAAANPLGMPVPAAAGAQQ

CRC64: D3B6F5D13FF7422DMD5: b6f2c321d42d50b985186307434b5166UPI: UPI0000000CB5

CRC64: A3B6F5D13DF7422EMD5: 605f4802e88ec1443d36520ac05df3b9UPI: UPI0000044948

Database Entry Version Organism First Seen Last Seen ActiveUniProtKB/Swiss-Prot P68399 1 Bos taurus (Bovine) 2004-11-23 2007-07-24 YesUniProtKB/Swiss-Prot P68400 1 Homo sapiens (Human) 2004-11-23 2007-07-24 YesUniProtKB/Swiss-Prot P19138 1 1990-11-01 2004-11-09 NoUniProtKB/TrEMBL Q5U065 1 Homo sapiens (Human) 2005-05-10 2007-07-24 YesTrEMBLnew AAH53532 2003-06-14 2003-08-30 NoTrEMBLnew AAH11668 2003-03-29 2003-06-14 NoInternational Protein Index (IPI) IPI00016613 1 2003-03-14 2004-11-15 NoInternational Protein Index (IPI) IPI00744507 1 Homo sapiens (Human) 2006-05-16 2007-06-29 YesInternational Protein Index (IPI) IPI00707334 1 Bos taurus (Bovine) 2006-01-24 2007-06-29 YesRefSeq NP_808227 1 2004-07-08 2007-05-08 YesRefSeq NP_001886 1 2004-09-24 2007-05-08 NoRefSeq NP_777060 1 2004-09-14 2006-02-19 NoRefSeq XP_850579 1 2005-08-31 2007-05-08 YesRefSeq XP_001112324 1 2006-06-15 2007-05-08 YesRefSeq XP_001112363 1 2006-06-15 2007-05-08 YesEnsembl ENSCAFP00000010339 Canis familiaris (Dog) 2004-12-09 2007-06-04 YesEnsembl ENSP00000217244 Homo sapiens (Human) 2003-04-01 2007-05-31 YesEnsembl ENSP00000339247 Homo sapiens (Human) 2004-05-12 2007-05-31 YesEnsembl ENSP00000341595 2006-04-03 2006-09-27 NoEnsembl ENSMMUP00000037659 Macaca mulatta (Rhesus macaque) 2006-08-01 2007-06-04 YesEMBL Annotated CONs EAX10665 1 Homo sapiens (Human) 2007-01-15 2007-06-12 YesEMBL CDS AAA35503 1 Homo sapiens (Human) 2003-03-12 2007-06-20 YesPIR-PSD Archive A30319 2003-03-31 2003-04-04 YesEuropean Patent Office (EPO) CS458506 1 2007-03-20 2007-06-09 YesUS Patent Office (USPO) AAE81305 1 2003-03-26 2007-06-01 YesJapan Patent Office (JPO) BD879267 2007-01-11 2007-06-01 YesTROME NT_011387_19_8 2004-08-23 2004-08-28 NoH-Invitational Database (H-InvDB) HIT000053902 3 Homo sapiens (Human) 2006-07-02 2006-07-02 No

Real time annotation

The KB Protein Model Portal

Page 13: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Module 3Protein Model

Portal GUI

PSI SGKB Portal

www

SOAPwsdl

Metadata unification, access code mapping, generation of data warehouse and search indices

Module 3Protein Model Portal (PMP)

Backend

queries

SQLTCP/IPsocket

LRMS(SGE)

model meta database(mySQL)

model sequence similarity queries

(Blast)SGE Linux Cluster

model target sequence

match server (C/C++)

Link to coordinate

sat model provider

REST

The KB Protein Model Portal

Page 14: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics
Page 15: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics
Page 16: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

http://www.proteinmodelportal.org

Page 17: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

The KB Protein Model Portal

Page 18: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

http://www.proteinmodelportal.org

UniProt (REST queryuniprot.org)

InterPro (Pfam) (DAS query)

Model Preview Image (generated on the fly with

coordinates from provider; REST)

Model Meta Information (PMP model meta

information DB, SOAP)

Target Template Alignment (structural alignment with template;

generated on the fly; REST)

Model Download (link to original structure provider)

The KB Protein Model Portal

Page 19: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Overview: The KB Model Portal

Introduction

Protein Model Portal Mission and Goals

PMP Version 1.0: Content and Features

Technical Implementation

Outlook: Next steps New Features & Functions

Modeling Portal Community Workshop

Questions & Discussion

Page 20: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Outlook: Next steps

New Features & Functions Better visualization of query results

Page 21: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Outlook: Next steps

New Features & Functions Better visualization of query results

Interactive structure / model comparison

Visualization of mapped properties (sequence

conservation; quality assessment results;

UniProt annotation)

Residue-level annotations (UniProt,

InterPro)

Model quality assessment tools

Page 22: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Accuracy and application of protein structure models

Baker D, Sali A. Protein structure prediction and structural genomics. (2001) Science. 294:93-96.

Page 23: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Community Workshop

Workshop on Applications of Protein Models in Biomedical Research

University of California, San FranciscoJuly 11 &12, 2008

How are protein structure models used in biomedical research projects today? Which requirements and limitations exist for the different applications?

Structure based drug design Analysis of SNPs and disease related mutations Phasing X-ray crystallography data by molecular replacement Interpretation of low-resolution experimental data Protein engineering and design Functional characterization of novel proteins

Page 24: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

Community Workshop

Workshop on Applications of Protein Models in Biomedical Research

University of California, San FranciscoJuly 11 &12, 2008

We need your input! Please ...

participate in the workshop;

send us examples of successful use of models in your work, and negative

examples when models did not do what you expected;

let us know, what you expect from proteins models, and which aspects of

modeling techniques need improvement to make models more useful to your

research.

[email protected]

Page 25: The PSI KB Protein Model Portal Torsten Schwede NIGMS PSI „Bottlenecks“ Workshop Bethesda, April 14, 2008 Swiss Institute of Bioinformatics

AcknowledgementsBiozentrum & SIB,University of BaselMichael PodvinecJürgen KoppLorenza BordoliRainer Pöhlmann Konstantin ArnoldJames BatteyPascal BenkertFlorian KieferSIB GenevaEric Jain

Funding:

NIH – National Institutes of Health SIB – Swiss Institue of Bioinformatics

RCSB-PDB

Helen Berman

John Westbrook

Wendy Tao

FCCC/NMHRCM

Roland Dunbrack Jr.

UCSF/NYSGXRC

Andrej Sali

Ursula Pieper

MCSG

Christine Orengo

David Lee

JCMM

Adam Godzik

NESG

Diana Murray