37
Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Embed Size (px)

Citation preview

Page 1: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Quality of Protein Crystal Structures in the PDB

Eric. N Brown, Lokesh Gakhar

and

S. Ramaswamy.

Page 2: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Between objectivity and subjectivityCarl-Ivar Bränd´en & T. Alwyn Jones

Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S-751 24 Uppsala, Sweden.

Protein crystallography is an exacting trade, and the results may contain errors that are difficult to identify. It is the crystallographer's responsibility to make sure that incorrect protein structures do not reach the literature.

Nature 343, 687 - 689 (22 February 1990)

Page 3: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Amplitudes and Phases - Bias.

Animal stories - by Kevin Cowtan

Page 4: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Amplitudes and Phases - Bias.

More animal stories.

Page 5: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Stolen from Bernhard Rupp website without permission

Page 6: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

How much of what we think?

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Stolen from --- James Holton, Berkeley, without permission.

Page 7: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

VALIDATION Based on GeometryWHATIFPROCHECKMOLPROBITYRAMACHANDRAN PLOT.

STRUCTURE VALIDATIONValidation based on fit to DATA R-factor/R-freeReal space fit, Etc.Problem: Data to parameter ratio.

ADD Geometric Restraints - or Chemical Knowledge

COMPOSITE VALIDATION:ASTRAL - SPACIhttp://astral.Berkeley.edu/spaci.html

Page 8: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

WHY MORE?

DON’T WE HAVE ENOUGH VALIDATION TOOLS?

WHAT IS COMMON BETWEEN ALL EXISTING VALIDATION TECHNIQUES?THERE IS AN ABSOLUTE CORRECT ANSWER

WE KNOW THERE IS NO CORRECT ANSWER

Page 9: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

THINK DIFFERENTLY

• All crystallographers want to deposit the correct structure.

• There is subjectivity and bias - all of which are random

AVERAGE IS BEST !!

Page 10: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

QUALITY & AVERAGE

• How different are you from the average is a measure of quality

HOW DO YOU DESCRIBE THE AVERAGE?

Page 11: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Quality of Model

Independent Variables

Date submitted to PDB

Maximum resolution

X-Ray Source

Number of atoms

Similarity Index

Cross Terms

Dependent Variables

R-factor

R-free

Real-space R-value

Real-space CC

Outliers

Ramachandran Violations

Page 12: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Predictive Models

Example: How To determine weight for 5’7” male . . .

. . . make up an equation . . .

. . . choose a group of males . . .

. . . fit the equation to their weight . . .

. . . evaluate equation.

Page 13: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Open problems

• What independent variables?Quality = f(resolution)Quality = f(resolution, date, x-ray

source)• What equation?

Quality = a x resolution + b x date + cQuality = a x res + log

b2(date) + c

• How to fit it to observations?- Least squares vs. Maximum likelihood- Outliers

Page 14: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Choose model based on LL

Start with Metric = a x resolution + C Add or remove terms iteratively to decrease LL Use BIC to decide if a new parameter contributes to significant

decrease in LL or not

RESULT: An equation that predicts a given metric…

Data is all structures in the PDB that have all independent and dependent variables (16,609)

PICK ALL AVAILABLE METRICS (R-factor/R-free etc.. )

and FOR EACH METRIC

Page 15: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy
Page 16: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

R factor =C + rhigh + S + N + I + rhigh × (S + N) + N × I

Rreal −space = C+ rhigh + S+D + N + rhigh ×(S+D + I + N )+D ×(I + S)+ I × N + rhigh ×(S×D + I × N )

Page 17: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

EQUATIONS FOR METRICS!

Page 18: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

INFORMATION INHERENT IN THE MODEL

Model can tell us immediately What independent variables affect what metrics (dependent variables) and by how much?

Example: R-factor Vs time R-factor Vs source & resolution

Page 19: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy
Page 20: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy
Page 21: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

UNEXPLORED QUESTIONSIN THE MODEL?

Unexplored Independent Variables :• R-sym and Redundancy• Space group and volume of unit cell?• Refinement protocol• Solvent modeling and B-factor modeling.• Temperature of data collection.• Complexity - as a function of number of

chains of macromolecules.

Page 22: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Nine - metrics to ONEPrincipal component

analysis

• We took the nine metrics and combined them to form one metric accounting for co-relations and redundancy. Now we have one metric which is what we can call Quality-values.

• CONSTRUCTION of the Q-value of the average is zero. Negative numbers mean better than average - positive numbers worse than the average. Standard deviation is one.

Page 23: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy
Page 24: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

USE OF THE MODEL

• COMPARE STRUCTURES WITH THE AVERAGE - INDIVIDUALLY AND AS A GROUP.

Q- value is now independent of all the independent variables used to make the model. (Resolution, number of atoms, date of data collection, novelty of structure etc..)

Better indicator of quality than any one of the dependent variables.

Page 25: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

STRUCTURAL GENOMICS (updated - Jan 2008)

Page 26: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

MCSG over Time!

Page 27: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

MORE-SG groups!

Page 28: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Quality Vs. Journals

Page 29: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

percentage better than global average

0 10 20 30 40 50 60 70 80 90 100

ImmunityNature immunology

Cell biochemistry and biophysicsMolecular and cellular biology

ScienceNucleic acids research

Journal of virologyBiochemical and biophysical research communications

The EMBO journal.Nature

Journal of immunology (Baltimore - Md)Nature structural biology

Journal of structural biologyDie Pharmazie

Chemistry (Weinheim an der Bergstrasse - Germany)EMBO reports

Plant & cell physiologyJournal of medicinal chemistry

Bioorganic & medicinal chemistry lettersBiochemistry. Biokhimiia

Bioorganic chemistryStructure (London - England)

The Journal of biological chemistryBiological chemistry

Journal of biological inorganic chemistry : JBICInorganic chemistryBiophysical journal

OTHERJournal of the American Chemical Society

Molecular microbiologyJournal of molecular biology

Acta crystallographica. Section D - BiologicalChembiochem : a European journal of chemical biology

Archives of biochemistry and biophysicsFEBS letters

Journal of bacteriologyProtein science : a publication of the Protein Society

ProteinsProtein engineering

Biochemical pharmacologyJournal of inorganic biochemistry

European journal of biochemistry / FEBS

Page 30: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

WHAT CAN WE DO?

• Beam lines.• Best practices.• Protocols and methodologies.• Countries.• Institutions.• Funding mechanisms.• Investigators.

Page 31: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Is this the best we can do?

Page 32: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

WE CAN DO BETTERWe improve quality of structures by better

design of experiments and refinement protocols if we know what independent variables affect what dependent variables and how?

BEFORE WE DO THIS - FIX PROBLEMS THAT WE FOUND.

•Too much dependence of external databases!

•Problems with unknown atoms.

•Develop methods for missing data correction.

Page 33: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

OTHER DATABASES - NMRSome thoughts on independent variables.• Spectrometers• Samples - size, tags, buffers etc..• Completeness of Assignments - percentage of

backbone assigned etc..• Actual Data Used in Structural Calculations -

NOE distance restraints, Hydrogen bond distance restraints (experimental vs. inferred), Torsion angle restraints, Dipolar coupling restraint, Paramagnetic restraint.

• Structural Statistics• Date of structure determination.• Relaxation measurements?

Page 34: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

OTHER DATABASES - NMR

DEPENDENT VARIABLES.• RMS deviation of Ensemble• Packing (Molprobity score?)• Ramachandran violations• Recall, Precision, F-measure (Huang, Powers and

Montelione).• Agreement with high resolution X-ray

structures• Other??

Page 35: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

AFTER Today's LECTURES

HOW ABOUT THE MODEL DATABASE?

I am sure out modeling experts can think of the dependent and independent variables….

Page 36: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

THANK YOU

ACKNOWLEDGEMENT

X-ray work - Eric N Brown and Lokesh Gakhar

The R-statistical package!

NMR work - Liping Yu and Andrew Fowler

Thanks to Brian Fox for inviting me - though I am not a member of any SG initiative.

Page 37: Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy

Questions and Accusations.