Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Protein CrystallographyPart III
Tim GrüneDept. of Structural Chemistry
Prof. G. SheldrickUniversity of Göttingen
http://shelx.uni-ac.gwdg.de
Overview
• The PDB file
• Model Building
• Refinement
• Restraints and Constraints
• Model Refinement
Molecular Biology 1 Protein Crystallography III
From Map to Model
An initial electron density (and also a final one) looks quite messy and is difficult to interpret. The finalcoordinate model contains more useful information. It is the target of model building and refinement.
Molecular Biology 2 Protein Crystallography III
Storing Structural Data — the PDB–file
The protein models that are stored e.g. in the Protein Data Bank, PDB, http://www.pdb.org, do not rep-resent the mere experimental data. From the experiment we get diffraction intensities and — after somework — the electron density ρ within the unit cell. The model is the best match (from the author’s point ofview) that explains the experimental data.
A typical PDB-file contains a header with supplemental information (authors, compound, publication, etc.),the crystallographic space group and unit cell dimensions, and a list of atoms. An atom entry containsatom type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor.
HEADER LIGASE 28-APR-99 1CLITITLE X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDETITLE 2 SYNTHETASE (PURM), FROM THE E. COLI PURINE BIOSYNTHETICTITLE 3 PATHWAY, AT 2.5 A RESOLUTIONAUTHOR C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK
...CRYST1 71.170 211.680 94.450 90.00 90.00 90.00 P 21 21 21 16
...ATOM 1 N THR A 5 15.163 80.897 61.279 1.00 20.99 NATOM 2 CA THR A 5 15.093 82.326 61.723 1.00 22.09 CATOM 3 C THR A 5 16.450 83.017 61.598 1.00 21.68 C
...
Molecular Biology 3 Protein Crystallography III
Data Visualisation
Cα trace(smooth) ball–and–stick CPK (space filling)
Cα trace (coloured by B-factor) ball-and-stick (coloured by B-factor)
ribbons
Molecular Biology 4 Protein Crystallography III
Occupancy and B-factor of an Atom
A typical crystal consist of a large number (> 1013) of unit cells, and the resulting model is therefore onlyan average of all these cells. Some atoms, especially those of large side chains (Arginine, Phenylalanine,. . . ) can be partially disordered, others can have several but fixed orientations. An occupancy lower than1 indicates that an atom occupies this position in only a fraction of all unit cells.
Even though data are most often collected at 100 K, atoms are not immobile but vibrate — thermal motion.The temperature– or B– factor describes the vibration as a sphere within which the atom oscillates. Forhigh resolution, the B-factor splits up into a 3x3–matrix that describes anisotropic thermal motion in threedimensions.
Molecular Biology 5 Protein Crystallography III
Multiple Conformation
Molecular Biology 6 Protein Crystallography III
Model Building and Refinement
Creating a model from X-ray data is an iterative process consisting of model building and refinement.
Refinement means global improvement of the model with respect to the experimental data. Coordinatesof all atoms together with their temperature factors (and sometimes, at very high resolution, even theoccupancy), are moved in order to minimise the difference between the measured intensities and theones calculated from the model.
Model Building means local improvement of the model with respect to the experimental data. Atoms areadded, removed, or moved in order to ensure
1. the model makes sense bio–chemically (proximity of atoms, H-bonding, position of solvent molecules,etc.)
2. the model fits the calculated electron density (e.g. check for multiple conformations)
Molecular Biology 7 Protein Crystallography III
Data to Parameter Ratio
No measurement can be exact and is only an approximation to the true value. It is therefore important tohave enough data to support the deduced model.
In protein crystallography we want to determine at least the coordinates for every atom of the structure. Ifmore data are available, we add the isotropic B-value, and at best we can even determine an anisotropicB-value. Our data is determined by the resolution, solvent content, and the unit cell dimensions.
Res.[Å] parameters data/parameters3.0 x,y,z 0.9:12.3 x,y,z; B 1.5:11.8 x,y,z; B 3.1:11.5 x,y,z; B 5.4:11.5 x,y,z; U11U12U13U23U22U33 2.4:11.1 x,y,z; U11U12U13U23U22U33 6.1:10.8 x,y,z; U11U12U13U23U22U33 16:1
G. Sheldrick
These ratios, up to about 1.8Å, would be much too low to allow building of a proper model. The effectivenumber of data is increased by the incorporation of additional — (bio–) chemical etc. — information.
Molecular Biology 8 Protein Crystallography III
Fitting of Data
Parameters used to be and in some occasions still are fitted to the data by least-squares-fit.
The line (parameters are slope and y-intercept) is to be fitted tothe (data) points. The least-squares-fit yields the line with thesmallest total distance to the data points.
More data do not necessarily give a different line, but they re-duce the error of the line, i.e. increases the confidence withwhich we can trust our result
That is why the data to parameter ratio is an important figure to indicate the quality of a model. Refinementand building strategies differ depending on that ratio.
Molecular Biology 9 Protein Crystallography III
Local Minima and Traps
Refinement can only find the next minimum of its target function.
Depending on the starting point (red crosses), this might result in a good or a bad model.
Molecular Biology 10 Protein Crystallography III
Refinement — the R–value
Refinement programs target at minimisation of the R–value, which describes the agreement betweenmeasured amplitudes (
∣∣∣F obs(hkl)∣∣∣) and those calculated from the model (
∣∣∣F calc(hkl)∣∣∣).
R =
∑hkl ||Fobs| − |Fcalc||∑
hkl (|Fobs|)
|Fobs| are represented by the reflection data (observations), |Fcalc| are calculated from (x,y,z) and B-valuesof the atoms of the model.
For small molecules, R–values between 2% and 5% are normal, for macromolecules, the range is approx-imately 20%–30%.
As a rule of thumb one can expect an R–value about 1/10 of the resolution: a 2.5Å structure should havean R–value of 25%.
Molecular Biology 11 Protein Crystallography III
Refinement and Overfitting
Since the amplitudes lack some information (their phase) and are not ideal (for protein structures, theerrors are fairly large), this difference can be nearly arbitrarily reduced by adding more and more atomsthat were not really present in the crystal structure or allowing positions that chemically do not make muchsense (stereochemical clashes). This is called overfitting of data. It is therefore important to imposerestraints and constraints.
One measure to reduce overfitting is the Rfree–value. About 5%–10% of the reflections are excludedfrom minimisation of the R–value. They remain unconsidered and are like an “independent judge”: afterrefinement, the Rfree value is calculated like the R–value, but with the excluded reflections. The two valuesmust not differ too much.
Molecular Biology 12 Protein Crystallography III
Constraints and Restraints
The reflection data alone would not be sufficient to create a trustworthy model. There are too many pa-rameters. Therefore it is necessary to incorporate additional information. This is done by using restraintsand constraints.
Constraints are fixed conditions and cannot be changed (e.g. occupancy of atoms).
Restraints allow variation within certain limits.
These ideal values are derived from high resolution structures that showed that certain geometric proper-ties of macromolecules do not vary a lot. . Examples are
• bond lengths (e.g. C − C = 1.54Å)
• planarity of aromatic rings (Phe, Tyr,...)
• anti-bumping (unbonded atoms cannot get to close)
Most models of macromolecules can only be built because of this extra information. It improves the datato parameter ratio.
Molecular Biology 13 Protein Crystallography III
Maximum Likelihood
A more modern approach than least-squares is the maximum likelihood method. It applies statisticalassumptions and allows to include more data and information, e.g. experimental phases. For macro-molecules, maximum likelihood is more stable and leads to overall better results, often with reducedmodel bias.
Maximum likelihood incorporates errors of the data and avoids that a model is built with higher accuracythan the data would permit.
Molecular Biology 14 Protein Crystallography III
Getting Started
The first steps in building the model consist of finding larger groups of residues with special features.
In protein this is the (Cα) main chain, in nucleic acids the phosphate backbone. α–helices are particularlyeasy to locate, even at medium to low resolution (2.5–4Å).
Molecular Biology 15 Protein Crystallography III
Directionality
From the main chain (Cα–chain) one cannot determine the direction, nor which part of the sequence itcovers. One gets help from the so-called christmas tree: the side chains of an α–helix point towards theN–terminal end of the protein chain
Selenomethionine substituted proteins have become very popular for MAD–experiment. The heavy se-lenium atoms are easy to find in the electron density map and help docking the sequence to the map.Disulphide bridges or metals bound to an active centre can also be helpful.
Molecular Biology 16 Protein Crystallography III
β–strands
β–strands are also striking but more difficult to build. Especially the direction of the peptide chain can bedifficult to find.
Molecular Biology 17 Protein Crystallography III
Manual Building I
At high resolution (d<2Å), building is extremely facilitated by programs like Arp/Warp (A. Perrakis, V.Lamzin) or Resolve (T. Terwilliger), which automatically build large parts of the structure. These programscan even overcome local minima.
Refinement programs (either least-squares or maximum likelihood) cannot cross this barrier — they wouldget stuck in the local minimum and could not move the Phenylalanine into the right position.
Molecular Biology 18 Protein Crystallography III
Manual Building II
Computer programs do not know about biology, certainly not of a specific molecule/structure.
• presence of ligands and/or metal ions (from crystallisation or protein preparation)
• special interaction for complexes
• exceptions from standard values used in refinement
• correct placement of solvent (water) molecules
Even this sort of information increases the data to parameter value and hence improves the quality of themodel. This becomes especially important at medium or low resolution (2.5Å and worse).
Molecular Biology 19 Protein Crystallography III
What about hydrogen Atoms?
X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the totalnumber. Hydrogen atoms only have one electron. They cannot be detected by X-ray diffraction (unlesswith very high resolution data, 1Å). This is different for neutron diffraction, which makes this techniquevery valuable for studies of enzymes and their active centres.
During refinement, hydrogens are treated as riding atoms, that is, in a fixed position relative to the groupsthey belong to (like the carbons of a phenylalanine ring).
Instead of completely ignoring hydrogens, this method improves the quality of the model and also aidsto keep the correct distances to neighbouring groups. Because of the fixed position, riding atoms do notincrease the number of parameters.
Molecular Biology 20 Protein Crystallography III
Empty Space? — The solvent region
Molecular Biology 21 Protein Crystallography III
The Solvent Model
Protein crystals are not very tightly packed. The space between the molecules is filled with solvent, 50–70% of the total volume on average. Because it is disordered, it contributes mostly to reflections below 6Åresolution (d>6Å).
Possible ways to treat the solvent are:
1. ignore the solvent — results in high R-value
2. ignore data with d>6Å — better R-value but worse maps
3. consider the solvent region as a flat lake of electron density
Molecular Biology 22 Protein Crystallography III
Example Refinement
This chart illustrates the steps of model building and their impact on the model quality (here only measuredby the R–value and Rfree–value (meaning explained tomorrow):
No. Action taken N(param.) R % Rfree %1 MR: pdb 7rxn 1 22.9 23.42 + 60 waters 1822 15.7 18.73 Fe, S anisotropic 1857 14.8 17.74 all H-atoms 1857 14.0 16.85 all C,N,O anisotropic 4097 8.8 11.36 + 28 waters, occ. 4556 7.5 10.37 6 disord. side chains 4698 6.9 9.7
Molecular Biology 23 Protein Crystallography III