15
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Embed Size (px)

Citation preview

Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

EBI is an Outstation of the European Molecular Biology Laboratory.

Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

• Established in 1996 at the European Bioinformatics Institute – autonomous structural database capability in Europe.

• One of the four sites around the world where structural data can be deposited.

• Stable and clean repository for macromolecular structure data.

• Services that allow users to access, search and retrieve structural data from a single web access point.

The Protein Data Bank in Europe (PDBe) group

Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Depositor AutoDep4.0 “Raw” PDB file

Automated+ ManualCuration

“Annotated” PDB fileDepositor’s comments

Structure release

Data Processing at PDBe

Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Data Deposition at the PDBe using AutoDep4.0

• Structure deposition and archival tool developed at the PDBe (EBI).

• Based on Java/XML technology.

• Available freely under license for academic and industry users.

• Easy to install and use for in-house archiving before deposition to the PDB via the PDBe interface.

http://www.ebi.ac.uk/pdbe-xdep/autodep

Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

The Curation Process

• Raw information obtained from the Depositor - a) atomic coordinates (proteins, nucleic acids, Ligands, solvents) b) source of the macromolecule c) number of protein chains present in the asymmetric

unit d) experimental data (structure factor file)

• Three Phases of Curation – 1)Automated Curation 2)Manual Curation 3)Final Checks.

Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Automated Curation

• Consists of series of programs written in Fortran and Perl

• Annotators contribute ideas and programs in order to improve the curation process

• We work in a Unix command line interface• This is the first Step : a big wrapper

Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

The Wrapper

• Automatically generates:• Chain ID for every HETATM and HOH (gets the chain ID of the

closest polypeptide chain)• Quaternary structure, according to PISA (REM300&350)• Structure validation: Close contacts (REM500) and chirality

checks• Solvent molecules that lie farther than expected from the protein

(REM525)• HELIX, SHEET, SSBOND, CISPEP records• Residue by residue Mapping against the Uniprot database• Dohlc output

Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Contents of a Curated PDB file

Sequence related information:

1)Sequences (SEQRES) – all macromolecules present during crystallization, including expression tags and residues missing from the coordinates due to disorder.

2) Sequence Database reference (DBREF) - provides mapping (FASTA

alignment) between the sequence (SEQRES) against the Uniprot database.

Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Macromolecular Structure Databasehttp://www.ebi.ac.uk/msd/

09.10.07

Checks made …

• Is the Uniprot accession number correct? The sequence similarity

between the Uniprot sequence and the target sequence should be

minimum ~95%

• Identification of N- and C-termini cross references with the Uniprot

and addition of fragment information (if any) to the COMPND record.

• Merge the data from the Uniprot entry to COMPND (Molecule name),

SOURCE (Scientific name of the organism) and KEYWDS

• Addition of EC number, if available

Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Curation procedures continued….1) If no Sequence database reference available: the sequence is self-

referenced (i.e. the database reference will be the PDB entry itself).

2) Additional details regarding the sequence (gaps, cloning artifact, structural disorder is provided in REMARK 999

3) Disagreement between a Uniprot sequence and the sequence present in the PDB file (SEQADV): marked as a) Engineered Mutation, b) conflict or c) microheterogeneity.

4) Residues missing from the coordinates – listed in REMARK 465

5) Non-hydrogen atoms missing from the coordinates- listed in REMARK 470

6) Zero-occupancy residues - REMARK 475

7) Zero-occupancy atoms - REMARK 480

8) Related PDB entries (same Uniprot Accession numbers) are listed in REMARK 900

9) Backbone discrepancies

Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Ligand Curation• Ligands interacting with a protein/DNA chain → substrate, product, inhibitor

(drug molecule), metal ion, modified amino acid or nucleotide.

• MODRES token added for Modified amino acids and nucleotides which are part of the polymer (i.e. protein/DNA) chain.

• Specialized software (Do Het Link and Connect records) used to get the bond type, stereochemistry and IUPAC compliant name for each ligand in the structure.

• DOHLC is a graph based structure comparison algorithm – checks each ligand/HET with dictionary definition, renames residues and atoms.

• Generates REMARK 620(metal coordination), LINK and CONECT records.• DOHLC failing – bad geometry, incomplete ligand or new HETGROUP• If no match found for a HETGROUP – new ligand created• HETGROUP with missing atoms - REMARK 610• HETGROUP with zero-occupancy atoms – REMARK 615

Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Generating Assembly Information

ASU Contents

Expand Crystal Symmetry

Analyze surface and contacts Best !!

Possible Assemblies

Loss of accessible surface area >10% of total

surface. True complexes also look good !

•Biological unit – Biologically relevant form of the molecule•Quaternary structures – the way protein chains tend to associate with one another•The matrices forming the quaternary structure are reported as BIOMT records in

REMARK 350

Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

1E94 PISA assembly

PISA assemblies

Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Structure validation

Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Macromolecular Structure Databasehttp://www.ebi.ac.uk/msd/

09.10.07

• Final Checks:• Programs check for PDB format accuracy and internal

consistency

• Manual check by another Annotator

• Automatic generation of the letter to depositor + Manual

addition of special comments