Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
http://www.pdbj.org/http://www.protein.osaka-u.ac.jp/rcsfp/pi/
Haruki NakamuraInstitute for Protein Research,
Osaka University
Tutorials for PDBjSearch Tools
EMBO workshop, 26 Sept. 2008
Protein Data Bank Japan
http://www.pdbj.org/
At Institute for Protein Research, Osaka Univ. since 2001 supported from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST).
Structure Data curationand editing
Structure Data browsing and downloading
PDBj members at IPR, Osaka Univ.
http://www.pdbj.org/
Processed data numbers at PDBj
0
2000
4000
6000
8000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
Yearly wwPDB processed numberYearly PDBj processed number
Yea
rly
regi
stra
tion
num
ber 8000
6000
4000
2000
01972 75 80 85 90 95 2000 2007
We process 25-30 % deposited data of the entire world, mainly from Asian and Oceania regions
Total 52,535 on August 19, 2008
year
http://www.pdbj.org/
Get Entry DataAccess to http://www.pdbj.org/
Summary for each PDBID is displayed.
PDBID (e.g. 1gof) should be input in a box and GO
Get Entry DataAccess to http://www.pdbj.org/
Summary for each PDBID
Graphic viewer: jV version 3.6Access to http://www.pdbj.org/jV/
Several Information for each Entry: Structural Details
Name of the molecule(s)
Molecular weight(s)
Keywords
etc.
Details of the structure
Several Information for each Entry: Experimental Details
Experimental method
(X-ray, NMR, EM, Neutron)
Parameters for the crystal
Crystallization conditions
etc.
Details of the experiment
Several Information for each Entry: Functional Details
Gene ontology information
Ligand binding
Functional site
etc.
Details of the function
Several Information for each Entry: Sequence Neighbor
Result of BLAST search
Sequence Navigator is used.
PDBID list of homologs
Several Information for each Entry: Download/Display
Conventional PDB format
Conventional PDB header
mmCIF
PDBML
PDBML without coordinates
PDBML for only coordinates
Structure factor
Download or display of the archival data
Several Information for each Entry: Link
RCSB-PDB, MSD-EBI
CATH, SCOP, FSSP: folds
UniProt: Sequences
KEGG: Pathways
EzCatDB: enzymes
etc.
Link to other databases
Advanced SearchFrom "Advanced Search" on Top page
Author names & Journal
Experimental method
Ligand name
Residues
Resolution
Species
etc.
Search by many conditions
XQuery/XPath SearchFrom “xPSSS (xml-based Protein Structure Search Service)” page
XML based search
XQuad: help by XQuery advisor
Search by XQuery/XPath
Search for all entries with helix of length equal to 10 residues:
XQuad: XQuery Advisor
XQuad: XQuery Advisor
List of hit PDBIDs is displayed.
Search of Similar Sequences:Sequence Navigator
PDBID or amino-acid sequence should be input and "Find All Homologs"
Structural alignment: GASH
PDBID or PDB-format file name should be input and "Superimpose"
The optimal structural alignment is displayed.
Search of Similar Structures:Structure Navigator
PDBID should be input and "Start Structure Navigator"
List of hit PDBIDs is displayed.
Total number in PDBMLplus 52,535
GO Information (Biological Process, Molecular Function, Cellular Component) 20,186
Extracted from Literatures by Annotators 20,040
Information of binding site residues from HETATM 29,006
Function Information from Uni-Prot(ACT_SITE, BINDING, DNA_BIND, NP_BIND, ZN_FING, TRANSMEM)
28,243
Function Information from CATRES/extCATRES-EBI-CSA-EBI-
3,17418,668
Primary Citation Information 48,968
Additional Information in our XML databse: PDBMLplus
(as of August 19, 2008)
Addition of Data in PDBMLplus<exptl>
<method>SYNCHROTRON RADIATION</method><crystal id="1"><grow auth_validate="N" update_id="6">
<method auth_validate="N" update_id="6">Microdialysis</method><temp auth_validate="N" unit="&#x2103;" update_id="6">4</temp><pH auth_validate="N" update_id="6">4</pH>
</grow>
<grow_comp id="1" auth_validate="N" update_id="6"><sol_id auth_validate="N" update_id="6">1</sol_id><name auth_validate="N" type="common name" update_id="6">protein</name><conc auth_validate="N" unit="mg/ml" update_id="6">13</conc>
</grow_comp>
<grow_comp id="2" auth_validate="N" update_id="6"><sol_id auth_validate="N" update_id="6">2</sol_id><name auth_validate="N" type="common name" update_id="6">ammonium
sulphate</name><conc auth_validate="N" unit="%sat" update_id="6">70</conc>
</grow_comp>::
</crystal></exptl>
Example for 12as with the functional site information
Command:
show xps3
Advanced usage of jV version3 with xPSSS
xPSSS (xml-based Protein Structure Search Service)
PDBML
PDBMLplus
Web server
XSLT processor
downloader
Loader
Archive(RCSB-PDB/MSD-EBI
/PDBj)
Native XML-DB
PDBMLplus
PDBMLplusF
download(FTP)
FTP server
Internet
DDBJSwisProt/UniProt
PIR/GenBank/KEGG/GDB/ProTherm/EzCatDB
EBI/CSA/CATRES
Function/Source
Information
Get/Input Tools
CATRESData
AnnotationData
AddInformation
Filtering &Recostructing
PDBMLplus
PDBMLplusF
xPSSS
Manual inputfrom literatures
Primary Citation DB with PDF files
Browser
Primary Citation DatabaseOnly Internal Usages in PDBj
Web input tool
18,814 PDF files have been collected.
Protein Molecular Surface Database, eF-site(Kinoshita & Nakamura)
Protein Dynamics Database, ProMode(Wako & Endo)
Development of other Databases and Services
BioMagResBank-NMR experimental data(Akutsu, Harano & Nakatani)
Search for Similar Surface,eF-seek (Kinoshita & Nakamura)
Electron Microscopy Navigator, EM-Navi (Suzuki)
Encyclopedia of Protein Structures, eProtS(Kinjyo, Kudo, & Ito)
Protein Globe
(Kinjo, A. R. & Standley, D. M.)
Standley, D. M. et al., Brief. Bioinfo. (2008) 9, 276-285.
Protein GlobeBy Akira Kinjo & Daron Standley
http://www.pdbj.org/Globe/
Standley, D. M. et al., Brief. Bioinfo. (2008) 9, 276-285.
All-α
All-β
α/β
eF-site/eF-surf/eF-seek
(Kinoshita, K. & Nakamura, H.)
Kinoshita, K. et al., Nucl. Acids Res. (2007) 35, W398-W402. Kinoshita, K. & Nakamura, H., Protein Sci (2005) 14, 711-718. Kinoshita, K. & Nakamura, H., Bioinformatics (2004) 20, 1329-1330.
eF-site database: http://ef-site.hgc.jp
Almost all PDB entries are calculated.Individual subunits are calculated..Each model for NMR structure is calculated.
Molecular surface and electrostatic potential
Connolly surface(Molecular surface)
Dielectric constant: 80.0
Dielectic constant: 2.0
Charges:AMBER partial chargesGrid size:1.0Å
Ionic strength: 0.1 M
Probe sphere Solvent Accessible Surface
Protein core
Re-entrant surface
What can we see from molecular surface?
[example] Myb proto-oncogene protein
DNA-boundDNA-unbound DNA-bound
eF-site IDPID_ModelID-ChainID
– example: 1a1t_3-AModelID is ignored, when no ModelID
– example: 1tup-CAlphabetic Chain IDs for multiple chains
– example: 1tup-ABCEFLink to individual surfaces
– For each eF-site ID• http://ef-site.hgc.jp/eF-
site/servlet/Summary?entry_id=1tup-EF– For each PDB-ID
• http://ef-site.hgc.jp/eF-site/servlet/Search?pdb=1tup
Summary Page for each Entry
Surface Browsing with jV
Download of each data file
Link to other DBs
Structure Page for surface and structure browsing
Structure based function prediction
Functional site database:Local structure of functional site of proteins
similarity search
Goal– To predict a molecular
function of proteins from their 3D structures
Approach– To search for similar
structures against the functional site database (local structure)
Structural information– Molecular surface
generated by Connolly’s algorithm
– Electrostatic potential obtained by solving Poisson-Boltzmann equations numerically -0.1 +0.1(V)
Function unknown protein
Prediction of Ligand Binding Sites: eF-seekhttp://ef-site.hgc.jp/eF-seek
Prediction of Functional sites by similarity search for eF-siteSearch for representative ligand binding sites
For the uploaded PDB-formatted file, the putative functional sites are predicted, and the assumed complex structures will be replied.
ProMode
(Wako, H. & Endo, S.)
Wako et al., Bioinformatics (2004) 20, 2035-2043.
Database of Normal Mode Analysis of Proteinshttp://promode.socs.waseda.ac.jp/
Command window of jV.
A protein structure vibrating in a given normal mode can be observed (animation displayed by jV and Chime viewer).
Dynamic domains (blue and red regions) are defined for each normal mode.
Dynamic domains (blue and red regions) are defined for each normal mode.
A protein structure vibrating in a given normal mode can be observed (animation displayed by jV and Chime viewer).
Time average properties obtained by the normal mode analysis are shown.
Fluctuation of atoms
Fluctuation of dihedral angles
Correlations between fluctuations of atoms.Red: atom pairs with a strong positive correlationBlue: atom pairs with a strong negative correlation
Example of tetramerFluctuation of atoms
Internal (green) and external (blue) motions of each subunit are shown for oligomer data
New!
EM Navigator
(Suzuki, H.)
What’s EM Navigator?What’s EM Navigator?EM Navigator is • web site for browsing 3D electron microscopy (3D-EM) data
URL: http://emnavi.protein.osaka-u.ac.jp/• based on data from EM Data Bank (EMDB) and Protein Data Bank (PDB)• for non-specialists, beginners, and experts in 3D-EM or structural / molecular biology.
Top page with “Movie Slots” and text-search box
Enjoy viewing 3D structuresEnjoy viewing 3D structures
Interactive structure viewer (jV / Jmol) on Detail page(Data: PDB-ID-1GRU, requires Java Runtime Environment)
Interactive movie player on Movie page(Data: EMDB-ID-1508, requires Adobe Flash Player)
Enter into detailsEnter into details
Detail page for EMDB data (ID: 1261)
Table page to view details of multiple data
Detail page for PDB data (ID: 2J37)
Search, view, and enjoy 3D-EM data !
SeSAWSequence-derived Structure Alignment Weights
for identifying functional sites
• A way of comparing sequence and structure similarities between proteins
• Structural similarities measured using ASH structural alignment program
• Sequence similarities measured using position specific scoring matrices (PSSMs) from psiBLAST
(Standley, D. M.)Standley, D. M. et al., PROTEINS (2008) 72, 1333-1351.
arg 20
1
exp( ( / ) ){ }alignN
T et ASH Blosum Blosum PSSM PSSMm
m
S S d d w S w S=
⎡ ⎤= + − +⎣ ⎦∑
Score for strucutural similarity Blosum62 score Score from PSSM values
dm: distance between the Cα atom pairs in the aligned structures. do: 4A
Identification of Protein Families/Superfamilies
ROC curve
Pfam familySCOP/CATH Superfamily
Confidence Measures
argT etS
TP/
(TP+
FP)
Pfam family
SCOP/CATH Superfamily
Standley, D. M. et al., PROTEINS (2008) 72, 1333-1351.
Example: Hypothetical protein TTHA1568 fromThermus thermophilus 2czl
Co-crystalized withtartaric acid
2czl
2czlA Results
SG Targets
His BindingGlu Binding
Lys/His Binding
SG TargetsGlu Binding
Good match to 2czl: 1ii5, a glutamate-binding protein
2czlA Family: DUF191 1ii5A Family: SBP_bac_3 STarg: 57
2czl and 1ii5 have common binding site
Tartaric acidGlutamate
G82
eProtSEncyclopedia of Protein Structures
(Wiki-eProtS)
(Kudo, T. & Kinjo, A. R.)
Encyclopedia of Protein Structures (Wiki-eProtS)
http://eprots.pdbj.org/
Introduction and request for writing articles of Encyclopedia of Protein Structures (Wiki-eProtS)
Protein Data Bank (PDB)52,535 entries
•select proteins•annotate for the general audience(in English and Japanese)
eProtS322 entries
(at Aug 20,2008)
What’s Encyclopedia of Protein Structures (eProtS) ?
To enlighten and feedback the accomplishment of structural biology to the general public...
Example (α-amylase)
Protein nameSpeciesBiological context
Structure description
Links to PDBj (xPSSS, jV)
Referencesoriginal paper,links to other database,author, and translator
MembersHead: Haruki NakamuraGroup for PDB Database Curation:Atsushi Nakagawa, Takanori MatsuuraReiko Igarashi, Yumiko Kengaku, Kanna Matsuura,Mayumi Inoue, Chen Minyu
Group for Development of New Tools and Services:Daron M. Standley, Akira R. Kinjo, Hirofumi Suzuki,Reiko Yamashita, Takahiro Kudou, Yukiko Shimizu
Group for NMR Database (BMRB-PDBj):Toshimichi Fujiwara, Hideo Akutsu, Eiichi Nakatani, Yoko Harano
Other Collaborators:Kengo Kinoshita (IMS, Univ. Tokyo), Hiroyuki Toh (MIB, Kyushu Univ.), Hiroshi Wako (Waseda Univ.),Nobutoshi Ito (Tokyo Med. Dent. Univ. )
Secretary:Chisa Kamada