28
Data Management in the Data Management in the http://www.pdb.org/ • [email protected]

Data Management in the [email protected]

Embed Size (px)

Citation preview

Data Management in theData Management in the

http://www.pdb.org/ • [email protected]

History of the PDB1970s

Community discussions about how to establish a PDBCold Spring Harbor meeting in protein crystallographyPDB established at Brookhaven (October 1971; 7 structures)

1980s Number of structures increases as technology improvesCommunity discussions about requiring depositionsIUCr guidelines establishedNumber of structures deposited increasesIndependent biological databases established – e.g., the NDB

1990smmCIF project completedStructural genomics beginsPDB moves to RCSB

2000sRCSB PDB renewedwwPDB established

PDB MissionPDB Mission

To provide the most accurate, well-annotated To provide the most accurate, well-annotated

data about macromolecular structure in the data about macromolecular structure in the

most timely and efficient way possible to most timely and efficient way possible to

facilitate new discoveries and advances in facilitate new discoveries and advances in

sciencescience

Nu

mb

er o

f re

leas

ed e

ntr

ies

Year

The Data Pipeline

Structure Determination PipelineStructure Determination Pipeline(X-ray)(X-ray)

Hypothesis Driven Target

SelectionCrystallomics

DataCollection

StructureDetermination

Isolation, Expression,Purification,Crystallization

DataDeposition

DataRelease

Publication

Data Processing Data FlowData Processing Data Flow

System for Data Collection and System for Data Collection and Archiving Archiving

Depositor

ReportsFinal Files

DatabaseLoader

DataViews

MAXIT Validation

MetadataDictionaries

ADITAutoDepInput Tool

Data

Data Processing System Data Processing System Features Features

Different dictionaries without software changes

Simple customization of both functionality and content

Automatically scales with changes in content

Can be distributed to multiple deposition sites

Reference data and standard nomenclature (ERFs)

Data Content Data Content of Each PDB Entryof Each PDB Entry

1970’s1970’s Name, source, reference, resolution, Name, source, reference, resolution,

sequence,secondary structure, crystal data, coordinates, sequence,secondary structure, crystal data, coordinates, unstructured remarksunstructured remarks

1990’s1990’s Name, source, reference,resolution, Name, source, reference,resolution, refinement details, refinement details,

data collection and processing details,symmetry details, data collection and processing details,symmetry details, biological unit information, missing residues, related biological unit information, missing residues, related entriesentries, sequence, , sequence, ligand and ionsligand and ions, secondary structure, , secondary structure, crystal data, coordinates, crystal data, coordinates, few unstructured remarksfew unstructured remarks

Annotation and ValidationAnnotation and Validation ADIT ADIT

Reviewing, adding, correcting entry informationReviewing, adding, correcting entry information

MaxitMaxit File format conversionsFile format conversions

Blast Automation Tool resultsBlast Automation Tool results

Validation Server ReportsValidation Server Reports

Ligand Depot, ChemDrawLigand Depot, ChemDraw

RasMol for VisualizationRasMol for Visualization

PubMed, Citation Tracker, Citation Tool PubMed, Citation Tracker, Citation Tool

Extending Data Dictionaries for Extending Data Dictionaries for DepositionDeposition

X-ray X-ray

Structure determination data itemsStructure determination data items

http://deposit.pdb.org/mmcif/sg-data/xstal.htmlhttp://deposit.pdb.org/mmcif/sg-data/xstal.html

NMRNMR

Structure determination data itemsStructure determination data items

http:// deposit.pdb.org /mmcif/sg-data/nmr.htmlhttp:// deposit.pdb.org /mmcif/sg-data/nmr.html

Protein ProductionProtein Production

http:// deposit.pdb.org /mmcif/sg-data/protprod.htmlhttp:// deposit.pdb.org /mmcif/sg-data/protprod.html

Growth of Molecular ComplexityGrowth of Molecular Complexity

Deposition Xray/NMR/EM by Deposition Xray/NMR/EM by yearyear

Deposition of X-ray, NMR & EM structures by year

0

500

1000

1500

2000

2500

196919711973197519771979198119831985198719891991199319951997199920012003

Year

X-ray

NMR

EM

Cryo-EM Dictionary ProposalCryo-EM Dictionary ProposalBiochemicalPreparation

em_sample_support

em_sample_preparation

em_solution_composition

em_array_formation

EM SpecimenPreparation

em_vitrification

em_stain

em_cryo_stain

em_embedding_agent

EM Data Collection

em_microscope

em_imaging

em_detector

em_electron_diffraction

em_image_scans

em_micrographs

em_electron_diffraction_phase

em_electron_diffraction_pattern

Structure Analysisem_3d_fitting

em_3d_fitting_list

em_classes

em_refinement

em_fsc_curve

Image Processing

em_3d_reconstruction

em_particle_picking

em_singleparticle_selection

em_particle_picking_list

em_filament_selection

em_filament_reconstruction

SampleDescription

em_assembly

em_entity_assembly

em_entity_assembly_list

em_icos_virus_shells

em_virus_entity

em_filaments

em_single_particle

em_2d_crystal

MARCH2005

New categoriesrecommended at

the Oct 2004 workshopare in pink

Target Registration DatabaseTarget Registration DatabaseTargetDB • http://targetdb.pdb.org/TargetDB • http://targetdb.pdb.org/

All targets downloadable in XML (~51,000 Targets)All targets downloadable in XML (~51,000 Targets) Targets downloaded from 18 centers weeklyTargets downloaded from 18 centers weekly Target search by:Target search by:

Sequence (FASTA), project target ID, project site, status (selected, Sequence (FASTA), project target ID, project site, status (selected, cloned, expressed, … in PDB), update date, protein name, source cloned, expressed, … in PDB), update date, protein name, source organismorganism

Report output in HTML, FASTA, and XMLReport output in HTML, FASTA, and XML Integrates PDB entry sequences (~55,600 sequences)Integrates PDB entry sequences (~55,600 sequences) Includes PDB pre-release sequence dataIncludes PDB pre-release sequence data Provides links to related sequence databasesProvides links to related sequence databases Open to all Structural Genomics projectsOpen to all Structural Genomics projects Summary reports of target or project progressSummary reports of target or project progress

Protein Expression Purification and Protein Expression Purification and Crystallization Database (PepcDB)Crystallization Database (PepcDB)

Extends content of TargetDB Extends content of TargetDB

All protocols for cloning, expression, purification are All protocols for cloning, expression, purification are stored and are searchablestored and are searchable

Reports provide links to status history, related Reports provide links to status history, related protocols, project, sequence and domain databasesprotocols, project, sequence and domain databases

Incremental Assembly

PepcDB

TargetDB

Target and Protocol Tracking

Target Tracking

PDB

Merging and integration

Target Selection

Samplepreparation

DataCollection

DataProcessing

StructureSolution

Refinement

Protocols

Tracking, Assembling and Tracking, Assembling and Archiving DataArchiving Data

Current Query SystemCurrent Query System

Reengineered Web SiteReengineered Web Sitepdbbeta.rcsb.orgpdbbeta.rcsb.org

Built on curated dataBuilt on curated data

Three-tier architectureThree-tier architecture Database tierDatabase tier Middle tierMiddle tier Presentation tierPresentation tier

Feedback from usersFeedback from users Help deskHelp desk Usability engineeringUsability engineering Focus groupsFocus groups

Went into public beta testing in July 2004Went into public beta testing in July 2004

Navigation and Query

Persistent Navigation Bar

Site Search HierarchicalMenu Items

PersistentSearch Box

Integrated Help(Context-sensitive)

GettingStarted

Worldwide PDB (wwPDB)Worldwide PDB (wwPDB) RCSB (Research Collaboratory for Structural RCSB (Research Collaboratory for Structural

Bioinformatics)Bioinformatics) PDBj (Osaka University)PDBj (Osaka University) Macromolecular Structure Database (EBI)Macromolecular Structure Database (EBI)

To ensure that PDB files remain in a single archive To ensure that PDB files remain in a single archive to best serve the worldwide community of depositors to best serve the worldwide community of depositors and usersand users

http://www.wwpdb.org/

AcknowledgementsAcknowledgements

Operated by the Research Collaboratory of Structural Bioinformatics

Supported by:

NIGMS

RCSB-PDB Team

RCSB PDB Team: Ken Addess, Helen M. Berman, Wolfgang F. Bluhm, Phil Bourne, Kyle Burkhardt, Li Chen, Sharon Cousin, Jim Croker, Nita Deshpande, Shuchismita Dutta, Zukang Feng, Lew-Christiane Fernandez, Judith L. Flippen-Anderson, Gary Gilliland, Rachel Kramer Green, Vladimir Guranovic, Shri Jain, Ann Kagehiro, Charlie Knezevich, Andrei Kouranov, Kevin Lwinmoe, Jeff Merino-Ott, Irina Persikova, Suzanne Richman, Melcoir Rosas, Kathryn Rosecrans, Bohdan Schneider, Wayne Townsend-Merino, Susan Van Arnum, Elizabeth Walker, John Westbrook, Alice Xenachis, Huanwang Yang, Jasmin Yang, Christine Zardecki, Cindy Zhang

www.pdb.org • [email protected]