Transcript
Page 1: From crystals to pdb: building a high throughput crystallography pipeline for structural genomics

From crystals to pdb: building a high throughput crystallography From crystals to pdb: building a high throughput crystallography pipeline for structural genomicspipeline for structural genomics

Chiu HJ1, Wolf G1, West W2, van den Bedem H1, Miller MD1, Zhang Z1, Morse A2, Wang X2, Xu Q1, Levin I1, von Delft F3, Elsliger MA3, Godzik A2, Grzechnik SK2 and Deacon AM1

1Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025. 2University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 3The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037

The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow.

Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection.

A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank.

The Joint Center for Structural Genomics

Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics.

Structural Genomics of Thermotoga maritima

T.maritima genome

A system to test the pipeline• Small bacterial genome 1877 gene products• Proteins should express well in E. coli• Proteins from a thermophile may be more stable• Process entire genome• Establish trends in process e.g. crystallization.

Category Number % Category Number %

Nucleic acid binding

DNA binding

DNA repair

DNA replication factor

Transcription factor

RNA binding

Structural Ribosomal

protein

Translation factor

Motor

Enzyme

170

109

11

3

37

43

52

12

5

600

9.2

5.9

0.5

0.1

1.9

2.3

2.8

0.6

0.2

32.4

Peptidase

Protein Kinase

Protein Phosphatase

Signal transducer

Cell adhesion

Structural Protein

Transporter

Ion channel

Ligand Binding or carrier

Electron transporter

Unknown or unclassified

27

17

8

32

1

61

202

3

255

52

713

1.5

0.9

0.4

1.7

0.0

3.3

10.9

0.2

13.8

2.8

38.5

Total 1877 100%

HT StructureDetermination2nd Generation

HT Data Collection1st Generation Prototype3rd Generation Software

TargetSelection

HT Imaging

1st Generation Hardware6th Generation Software

StructureValidation &

DepositionAutosubmission of

electronic publication

Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages

HT Crystallization

HT Purification

HT Expression

PDB

HT Pipeline Processes, Bottlenecks and Leaks

purificationexpressioncloning

struc. refinementstruc. validationannotationpublication

phasingdata collection

xtal screening tracingbl xtal mounting

crystallizationimagingharvesting

targetselection

All relevant crystal information is captured in the central JCSG database in the form of Beamline Report

Target ID

Dif

frac

tion

pro

per

ties

Res

olu

tion

S

pot

qu

alit

yD

iffr

acti

on s

tren

gth

Beamline

Crystallization codition Visual properties

Robust and automated crystal screening

Initial design to productionLarge-scale capacity Shipping, storage and screeningUsed by JCSG since June 2002Implemented on all SSRL beamlines

Cassette kits distributed to

PX user groups

Integration with BLU-ICEAutomated sample mountingAutomated sample alignmentAutomated diffraction images

Increased screening capacity during SSRL shutdownLeverage existing infrastructure X-ray MicroMax-002 generator installed June 2003SSRL automated screening system used>4200 crystals screened in 9 months

All data uploaded to JCSG DBScreening, collection and structure solutionWork closely with BIC on implementationand debuggingStill more features needed to handle expanding production

Structure solution tracking

Local SDC “dataset” database

Active crystal report

Xsolve: automation of structure determination

2004 developmentsImprove success rate: better autoindexing, determine optimal resolution for scaling sweepsMore general: handle crystallographic details: re-indexing screw axes, merging sweepsMore robust operation: catch timeouts, core dumps, infinite loops etcImplement parallelization: develop tools to monitor and control processing on a Linux clusterNew program support: HKL2000, SHARP, SHELXD (not completely tested)

MosflmAutoindex

MosflmIntegrate

SolveSolve

ResolveTrace

ScalaScale

SolveP4221 mol2 ...

SolveP4222 mols3...

SolveP41221 mol2 ...

SolveP41222 mol3...

SolveP42221 mol2 ...

SolveP42222 mols3...

AutoindexIntegrateScaleSolveTrace

Main goals

• Handle majority cases

• Organize data and workflow

• Ease information flow to JCSG DB

• Allow integration of new programs.

• Use parallel execution of jobs

Refinement Tracking System

Automation of protein model completion:an inverse kinematics approach

Automatically Build Backbone Fragments:

• Build candidate closing conformations using IK techniques (robotics)• Rank according to electron density fit and conformational likelihood• Subject top-ranking candidates to real-space, torsion angle SA refinement

Results:

• Closed missing fragments of up to 12 residues in length to within 0.6A all-atom RMSD in 2.8A-model

Manually Finalizing Model:

• Labor intensive, time consuming• Existing aids are highly interactive

Lotan et al. submittedvan den Bedem et al. in preparation

Total Crystals Screened at SDC 10778

Unique Targets Represented 356

TM/non-TM targets 299/57

Datasets collected 394 (288 TM, 106 non-TM)

Unique Targets Represented 194

TM/non-TM targets 146/48

Structures solved 155 (94 MAD; 51 MR; 3 SAD; 7 NMR)

(125 TM: 30 non-TM)

JCSG production statistics (August 10, 2004)

can be searched by

Shipment IDDewarTarget IDCassette/puck

Installation of a Microsource X-ray generator at 9-2

JCSG production statistics (August 10, 2004)

More to come…22 targets: data collected, not yet solved92 targets: diffraction better than 3.5Å, not yet solved

Growing reliance on the JCSG DB500 crystals and 8 structures per month20 cassettes (2000 crystals) inventory30-40 structures in refinement

2.0 TB of diffraction images 0.5 TB of processing files>100,000 diffraction images

Average resolution of structures in PDB 2.0AAverage protein chain length 260 aaAverage number of residues in asu 480 aa

TSRI Administrative CoreIan WilsonPeter KuhnMarc ElsligerFrank von DelftTina MontgomeryGye Won HanRong ChenAngela Walker

UCSD Bioinformatics CoreJohn WooleyAdam GodzikSusan TaylorSlawomir Grzechnik Bill WestAndrew MorseJie QuyangXianhong WangJaume CanavesLukasz JaroszewskiRobert SchwarzenbacherMarc Robinson RechaviChris EdwardsOlga KirillovaRay Bean, Josie Alaoen

Stanford /SSRLStructure Determination CoreKeith HodgsonAshley DeaconBritt HedmanGuenter WolfMitch MillerHenry van den BedemQingping XuHerbert AxelrodChristopher RifeInna LevinR. Paul PhizackerleyAmanda PradoJohn KovarikRoss FloydIrimpan MathewsMichael SolitsAina CohenPaul Ellis

GNF & TSRICrystallomics CoreRay Stevens Scott LesleyRebbeca Page Carina GrittiniGlen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike HornsbyEric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli VincentJeff VelasquezKin MoyVandana SridharBernard CollinsThomas Clayton

Scientific Advisory BoardCarl-Ivar Brändén,

Karolinska Inst., Stockholm (retired 2003)Elbert Branscomb,

DOE Joint Genome Inst., Walnut CreekStephen Cusack,

EMBL – Outstation GrenobleLeroy Hood,

Inst. for Systems Biology, SeattleJohn Kuriyan, U.C. Berkeley

Erkki Ruoslahti, The Burnham Institute

James Wells, Sunesis Pharmaceuticals, Inc.

Charles Cantor. Sequenom, Inc.Todd Yeates,

UCLA-DOE, Inst. for Genomics and Proteomics

James Paulson, Consortium for Functional Glycomics,

The Scripps Research Institute

Exploratory ProjectsKurt Wüthrich (NMR)Linda ColumbusTouraj Etezady-EsfarjaniWolfgang PetiVirgil Woods (DXMS)

Acknowledgements

NIH Protein Structure Initiative Grant P50 GM62411

Recommended