Upload
carter-harwell
View
217
Download
2
Embed Size (px)
Citation preview
Quality of Protein Crystal Structures in the PDB
Eric. N Brown, Lokesh Gakhar
and
S. Ramaswamy.
Between objectivity and subjectivityCarl-Ivar Bränd´en & T. Alwyn Jones
Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S-751 24 Uppsala, Sweden.
Protein crystallography is an exacting trade, and the results may contain errors that are difficult to identify. It is the crystallographer's responsibility to make sure that incorrect protein structures do not reach the literature.
Nature 343, 687 - 689 (22 February 1990)
Amplitudes and Phases - Bias.
Animal stories - by Kevin Cowtan
Amplitudes and Phases - Bias.
More animal stories.
Stolen from Bernhard Rupp website without permission
How much of what we think?
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Stolen from --- James Holton, Berkeley, without permission.
VALIDATION Based on GeometryWHATIFPROCHECKMOLPROBITYRAMACHANDRAN PLOT.
STRUCTURE VALIDATIONValidation based on fit to DATA R-factor/R-freeReal space fit, Etc.Problem: Data to parameter ratio.
ADD Geometric Restraints - or Chemical Knowledge
COMPOSITE VALIDATION:ASTRAL - SPACIhttp://astral.Berkeley.edu/spaci.html
WHY MORE?
DON’T WE HAVE ENOUGH VALIDATION TOOLS?
WHAT IS COMMON BETWEEN ALL EXISTING VALIDATION TECHNIQUES?THERE IS AN ABSOLUTE CORRECT ANSWER
WE KNOW THERE IS NO CORRECT ANSWER
THINK DIFFERENTLY
• All crystallographers want to deposit the correct structure.
• There is subjectivity and bias - all of which are random
AVERAGE IS BEST !!
QUALITY & AVERAGE
• How different are you from the average is a measure of quality
HOW DO YOU DESCRIBE THE AVERAGE?
Quality of Model
Independent Variables
Date submitted to PDB
Maximum resolution
X-Ray Source
Number of atoms
Similarity Index
Cross Terms
Dependent Variables
R-factor
R-free
Real-space R-value
Real-space CC
Outliers
Ramachandran Violations
Predictive Models
Example: How To determine weight for 5’7” male . . .
. . . make up an equation . . .
. . . choose a group of males . . .
. . . fit the equation to their weight . . .
. . . evaluate equation.
Open problems
• What independent variables?Quality = f(resolution)Quality = f(resolution, date, x-ray
source)• What equation?
Quality = a x resolution + b x date + cQuality = a x res + log
b2(date) + c
• How to fit it to observations?- Least squares vs. Maximum likelihood- Outliers
Choose model based on LL
Start with Metric = a x resolution + C Add or remove terms iteratively to decrease LL Use BIC to decide if a new parameter contributes to significant
decrease in LL or not
RESULT: An equation that predicts a given metric…
Data is all structures in the PDB that have all independent and dependent variables (16,609)
PICK ALL AVAILABLE METRICS (R-factor/R-free etc.. )
and FOR EACH METRIC
€
R factor =C + rhigh + S + N + I + rhigh × (S + N) + N × I
€
Rreal −space = C+ rhigh + S+D + N + rhigh ×(S+D + I + N )+D ×(I + S)+ I × N + rhigh ×(S×D + I × N )
EQUATIONS FOR METRICS!
INFORMATION INHERENT IN THE MODEL
Model can tell us immediately What independent variables affect what metrics (dependent variables) and by how much?
Example: R-factor Vs time R-factor Vs source & resolution
UNEXPLORED QUESTIONSIN THE MODEL?
Unexplored Independent Variables :• R-sym and Redundancy• Space group and volume of unit cell?• Refinement protocol• Solvent modeling and B-factor modeling.• Temperature of data collection.• Complexity - as a function of number of
chains of macromolecules.
Nine - metrics to ONEPrincipal component
analysis
• We took the nine metrics and combined them to form one metric accounting for co-relations and redundancy. Now we have one metric which is what we can call Quality-values.
• CONSTRUCTION of the Q-value of the average is zero. Negative numbers mean better than average - positive numbers worse than the average. Standard deviation is one.
USE OF THE MODEL
• COMPARE STRUCTURES WITH THE AVERAGE - INDIVIDUALLY AND AS A GROUP.
Q- value is now independent of all the independent variables used to make the model. (Resolution, number of atoms, date of data collection, novelty of structure etc..)
Better indicator of quality than any one of the dependent variables.
STRUCTURAL GENOMICS (updated - Jan 2008)
MCSG over Time!
MORE-SG groups!
Quality Vs. Journals
percentage better than global average
0 10 20 30 40 50 60 70 80 90 100
ImmunityNature immunology
Cell biochemistry and biophysicsMolecular and cellular biology
ScienceNucleic acids research
Journal of virologyBiochemical and biophysical research communications
The EMBO journal.Nature
Journal of immunology (Baltimore - Md)Nature structural biology
Journal of structural biologyDie Pharmazie
Chemistry (Weinheim an der Bergstrasse - Germany)EMBO reports
Plant & cell physiologyJournal of medicinal chemistry
Bioorganic & medicinal chemistry lettersBiochemistry. Biokhimiia
Bioorganic chemistryStructure (London - England)
The Journal of biological chemistryBiological chemistry
Journal of biological inorganic chemistry : JBICInorganic chemistryBiophysical journal
OTHERJournal of the American Chemical Society
Molecular microbiologyJournal of molecular biology
Acta crystallographica. Section D - BiologicalChembiochem : a European journal of chemical biology
Archives of biochemistry and biophysicsFEBS letters
Journal of bacteriologyProtein science : a publication of the Protein Society
ProteinsProtein engineering
Biochemical pharmacologyJournal of inorganic biochemistry
European journal of biochemistry / FEBS
WHAT CAN WE DO?
• Beam lines.• Best practices.• Protocols and methodologies.• Countries.• Institutions.• Funding mechanisms.• Investigators.
Is this the best we can do?
WE CAN DO BETTERWe improve quality of structures by better
design of experiments and refinement protocols if we know what independent variables affect what dependent variables and how?
BEFORE WE DO THIS - FIX PROBLEMS THAT WE FOUND.
•Too much dependence of external databases!
•Problems with unknown atoms.
•Develop methods for missing data correction.
OTHER DATABASES - NMRSome thoughts on independent variables.• Spectrometers• Samples - size, tags, buffers etc..• Completeness of Assignments - percentage of
backbone assigned etc..• Actual Data Used in Structural Calculations -
NOE distance restraints, Hydrogen bond distance restraints (experimental vs. inferred), Torsion angle restraints, Dipolar coupling restraint, Paramagnetic restraint.
• Structural Statistics• Date of structure determination.• Relaxation measurements?
OTHER DATABASES - NMR
DEPENDENT VARIABLES.• RMS deviation of Ensemble• Packing (Molprobity score?)• Ramachandran violations• Recall, Precision, F-measure (Huang, Powers and
Montelione).• Agreement with high resolution X-ray
structures• Other??
AFTER Today's LECTURES
HOW ABOUT THE MODEL DATABASE?
I am sure out modeling experts can think of the dependent and independent variables….
THANK YOU
ACKNOWLEDGEMENT
X-ray work - Eric N Brown and Lokesh Gakhar
The R-statistical package!
NMR work - Liping Yu and Andrew Fowler
Thanks to Brian Fox for inviting me - though I am not a member of any SG initiative.
Questions and Accusations.