24
Protein Crystallography Part III Tim Grüne Dept. of Structural Chemistry Prof. G. Sheldrick University of Göttingen http://shelx.uni-ac.gwdg.de [email protected]

Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Protein CrystallographyPart III

Tim GrüneDept. of Structural Chemistry

Prof. G. SheldrickUniversity of Göttingen

http://shelx.uni-ac.gwdg.de

[email protected]

Page 2: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Overview

• The PDB file

• Model Building

• Refinement

• Restraints and Constraints

• Model Refinement

Molecular Biology 1 Protein Crystallography III

Page 3: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

From Map to Model

An initial electron density (and also a final one) looks quite messy and is difficult to interpret. The finalcoordinate model contains more useful information. It is the target of model building and refinement.

Molecular Biology 2 Protein Crystallography III

Page 4: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Storing Structural Data — the PDB–file

The protein models that are stored e.g. in the Protein Data Bank, PDB, http://www.pdb.org, do not rep-resent the mere experimental data. From the experiment we get diffraction intensities and — after somework — the electron density ρ within the unit cell. The model is the best match (from the author’s point ofview) that explains the experimental data.

A typical PDB-file contains a header with supplemental information (authors, compound, publication, etc.),the crystallographic space group and unit cell dimensions, and a list of atoms. An atom entry containsatom type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor.

HEADER LIGASE 28-APR-99 1CLITITLE X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDETITLE 2 SYNTHETASE (PURM), FROM THE E. COLI PURINE BIOSYNTHETICTITLE 3 PATHWAY, AT 2.5 A RESOLUTIONAUTHOR C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK

...CRYST1 71.170 211.680 94.450 90.00 90.00 90.00 P 21 21 21 16

...ATOM 1 N THR A 5 15.163 80.897 61.279 1.00 20.99 NATOM 2 CA THR A 5 15.093 82.326 61.723 1.00 22.09 CATOM 3 C THR A 5 16.450 83.017 61.598 1.00 21.68 C

...

Molecular Biology 3 Protein Crystallography III

Page 5: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Data Visualisation

Cα trace(smooth) ball–and–stick CPK (space filling)

Cα trace (coloured by B-factor) ball-and-stick (coloured by B-factor)

ribbons

Molecular Biology 4 Protein Crystallography III

Page 6: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Occupancy and B-factor of an Atom

A typical crystal consist of a large number (> 1013) of unit cells, and the resulting model is therefore onlyan average of all these cells. Some atoms, especially those of large side chains (Arginine, Phenylalanine,. . . ) can be partially disordered, others can have several but fixed orientations. An occupancy lower than1 indicates that an atom occupies this position in only a fraction of all unit cells.

Even though data are most often collected at 100 K, atoms are not immobile but vibrate — thermal motion.The temperature– or B– factor describes the vibration as a sphere within which the atom oscillates. Forhigh resolution, the B-factor splits up into a 3x3–matrix that describes anisotropic thermal motion in threedimensions.

Molecular Biology 5 Protein Crystallography III

Page 7: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Multiple Conformation

Molecular Biology 6 Protein Crystallography III

Page 8: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Model Building and Refinement

Creating a model from X-ray data is an iterative process consisting of model building and refinement.

Refinement means global improvement of the model with respect to the experimental data. Coordinatesof all atoms together with their temperature factors (and sometimes, at very high resolution, even theoccupancy), are moved in order to minimise the difference between the measured intensities and theones calculated from the model.

Model Building means local improvement of the model with respect to the experimental data. Atoms areadded, removed, or moved in order to ensure

1. the model makes sense bio–chemically (proximity of atoms, H-bonding, position of solvent molecules,etc.)

2. the model fits the calculated electron density (e.g. check for multiple conformations)

Molecular Biology 7 Protein Crystallography III

Page 9: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Data to Parameter Ratio

No measurement can be exact and is only an approximation to the true value. It is therefore important tohave enough data to support the deduced model.

In protein crystallography we want to determine at least the coordinates for every atom of the structure. Ifmore data are available, we add the isotropic B-value, and at best we can even determine an anisotropicB-value. Our data is determined by the resolution, solvent content, and the unit cell dimensions.

Res.[Å] parameters data/parameters3.0 x,y,z 0.9:12.3 x,y,z; B 1.5:11.8 x,y,z; B 3.1:11.5 x,y,z; B 5.4:11.5 x,y,z; U11U12U13U23U22U33 2.4:11.1 x,y,z; U11U12U13U23U22U33 6.1:10.8 x,y,z; U11U12U13U23U22U33 16:1

G. Sheldrick

These ratios, up to about 1.8Å, would be much too low to allow building of a proper model. The effectivenumber of data is increased by the incorporation of additional — (bio–) chemical etc. — information.

Molecular Biology 8 Protein Crystallography III

Page 10: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Fitting of Data

Parameters used to be and in some occasions still are fitted to the data by least-squares-fit.

The line (parameters are slope and y-intercept) is to be fitted tothe (data) points. The least-squares-fit yields the line with thesmallest total distance to the data points.

More data do not necessarily give a different line, but they re-duce the error of the line, i.e. increases the confidence withwhich we can trust our result

That is why the data to parameter ratio is an important figure to indicate the quality of a model. Refinementand building strategies differ depending on that ratio.

Molecular Biology 9 Protein Crystallography III

Page 11: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Local Minima and Traps

Refinement can only find the next minimum of its target function.

Depending on the starting point (red crosses), this might result in a good or a bad model.

Molecular Biology 10 Protein Crystallography III

Page 12: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Refinement — the R–value

Refinement programs target at minimisation of the R–value, which describes the agreement betweenmeasured amplitudes (

∣∣∣F obs(hkl)∣∣∣) and those calculated from the model (

∣∣∣F calc(hkl)∣∣∣).

R =

∑hkl ||Fobs| − |Fcalc||∑

hkl (|Fobs|)

|Fobs| are represented by the reflection data (observations), |Fcalc| are calculated from (x,y,z) and B-valuesof the atoms of the model.

For small molecules, R–values between 2% and 5% are normal, for macromolecules, the range is approx-imately 20%–30%.

As a rule of thumb one can expect an R–value about 1/10 of the resolution: a 2.5Å structure should havean R–value of 25%.

Molecular Biology 11 Protein Crystallography III

Page 13: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Refinement and Overfitting

Since the amplitudes lack some information (their phase) and are not ideal (for protein structures, theerrors are fairly large), this difference can be nearly arbitrarily reduced by adding more and more atomsthat were not really present in the crystal structure or allowing positions that chemically do not make muchsense (stereochemical clashes). This is called overfitting of data. It is therefore important to imposerestraints and constraints.

One measure to reduce overfitting is the Rfree–value. About 5%–10% of the reflections are excludedfrom minimisation of the R–value. They remain unconsidered and are like an “independent judge”: afterrefinement, the Rfree value is calculated like the R–value, but with the excluded reflections. The two valuesmust not differ too much.

Molecular Biology 12 Protein Crystallography III

Page 14: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Constraints and Restraints

The reflection data alone would not be sufficient to create a trustworthy model. There are too many pa-rameters. Therefore it is necessary to incorporate additional information. This is done by using restraintsand constraints.

Constraints are fixed conditions and cannot be changed (e.g. occupancy of atoms).

Restraints allow variation within certain limits.

These ideal values are derived from high resolution structures that showed that certain geometric proper-ties of macromolecules do not vary a lot. . Examples are

• bond lengths (e.g. C − C = 1.54Å)

• planarity of aromatic rings (Phe, Tyr,...)

• anti-bumping (unbonded atoms cannot get to close)

Most models of macromolecules can only be built because of this extra information. It improves the datato parameter ratio.

Molecular Biology 13 Protein Crystallography III

Page 15: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Maximum Likelihood

A more modern approach than least-squares is the maximum likelihood method. It applies statisticalassumptions and allows to include more data and information, e.g. experimental phases. For macro-molecules, maximum likelihood is more stable and leads to overall better results, often with reducedmodel bias.

Maximum likelihood incorporates errors of the data and avoids that a model is built with higher accuracythan the data would permit.

Molecular Biology 14 Protein Crystallography III

Page 16: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Getting Started

The first steps in building the model consist of finding larger groups of residues with special features.

In protein this is the (Cα) main chain, in nucleic acids the phosphate backbone. α–helices are particularlyeasy to locate, even at medium to low resolution (2.5–4Å).

Molecular Biology 15 Protein Crystallography III

Page 17: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Directionality

From the main chain (Cα–chain) one cannot determine the direction, nor which part of the sequence itcovers. One gets help from the so-called christmas tree: the side chains of an α–helix point towards theN–terminal end of the protein chain

Selenomethionine substituted proteins have become very popular for MAD–experiment. The heavy se-lenium atoms are easy to find in the electron density map and help docking the sequence to the map.Disulphide bridges or metals bound to an active centre can also be helpful.

Molecular Biology 16 Protein Crystallography III

Page 18: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

β–strands

β–strands are also striking but more difficult to build. Especially the direction of the peptide chain can bedifficult to find.

Molecular Biology 17 Protein Crystallography III

Page 19: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Manual Building I

At high resolution (d<2Å), building is extremely facilitated by programs like Arp/Warp (A. Perrakis, V.Lamzin) or Resolve (T. Terwilliger), which automatically build large parts of the structure. These programscan even overcome local minima.

Refinement programs (either least-squares or maximum likelihood) cannot cross this barrier — they wouldget stuck in the local minimum and could not move the Phenylalanine into the right position.

Molecular Biology 18 Protein Crystallography III

Page 20: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Manual Building II

Computer programs do not know about biology, certainly not of a specific molecule/structure.

• presence of ligands and/or metal ions (from crystallisation or protein preparation)

• special interaction for complexes

• exceptions from standard values used in refinement

• correct placement of solvent (water) molecules

Even this sort of information increases the data to parameter value and hence improves the quality of themodel. This becomes especially important at medium or low resolution (2.5Å and worse).

Molecular Biology 19 Protein Crystallography III

Page 21: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

What about hydrogen Atoms?

X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the totalnumber. Hydrogen atoms only have one electron. They cannot be detected by X-ray diffraction (unlesswith very high resolution data, 1Å). This is different for neutron diffraction, which makes this techniquevery valuable for studies of enzymes and their active centres.

During refinement, hydrogens are treated as riding atoms, that is, in a fixed position relative to the groupsthey belong to (like the carbons of a phenylalanine ring).

Instead of completely ignoring hydrogens, this method improves the quality of the model and also aidsto keep the correct distances to neighbouring groups. Because of the fixed position, riding atoms do notincrease the number of parameters.

Molecular Biology 20 Protein Crystallography III

Page 22: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Empty Space? — The solvent region

Molecular Biology 21 Protein Crystallography III

Page 23: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

The Solvent Model

Protein crystals are not very tightly packed. The space between the molecules is filled with solvent, 50–70% of the total volume on average. Because it is disordered, it contributes mostly to reflections below 6Åresolution (d>6Å).

Possible ways to treat the solvent are:

1. ignore the solvent — results in high R-value

2. ignore data with d>6Å — better R-value but worse maps

3. consider the solvent region as a flat lake of electron density

Molecular Biology 22 Protein Crystallography III

Page 24: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of

Example Refinement

This chart illustrates the steps of model building and their impact on the model quality (here only measuredby the R–value and Rfree–value (meaning explained tomorrow):

No. Action taken N(param.) R % Rfree %1 MR: pdb 7rxn 1 22.9 23.42 + 60 waters 1822 15.7 18.73 Fe, S anisotropic 1857 14.8 17.74 all H-atoms 1857 14.0 16.85 all C,N,O anisotropic 4097 8.8 11.36 + 28 waters, occ. 4556 7.5 10.37 6 disord. side chains 4698 6.9 9.7

Molecular Biology 23 Protein Crystallography III