Absolute Structure Determination
and Light-Atom Structures
Simon Parsons
The University of Edinburgh
What This is About • X-ray crystallography is
so popular because it
directly produces
images of molecules.
No other technique
does this.
• One problem with the
technique is that
characterisation of the
absolute configuration
of a molecule is difficult.
• This information is often
vitally important though.
Absolute Structure & The Flack
Parameter
_2 2
model single single( ) (1 ) | ( ) | | ( ) |I x F x F h h h
Precision of x
Suppose I refine a structure and get x = 0.00(8).
Precision of x
Suppose I refine a structure and get x(u) = 0.00(8).
Notice that x may be –ve.
This is unphysical but may happen statistically.
0.24 = 3u u
σ
esd
sd
su
x = 0 x = 1
Physical range of x
Statistical range of x
x = 0 x = 1
How wide should the
distributions be?
x = 0 x = 1
x = 0 x = 1
Model right Model wrong Don’t know
Flack & Bernardinelli J. Appl. Cryst. (2000), 33, 1143
x = 0 x = 1
3u 6u 3u
12u = 1
u = 1/12 = 0.08 ~ 0.1
Refinement of x in SHELXL
TWIN -1 0 0 0 -1 0 0 0 -1
BASF 0.1
N value esd shift/esd parameter
1 4.86497 0.02065 0.000 OSF
2 -0.03716 0.26951 0.000 BASF 1
3 0.00881 0.00260 0.000 EXTI
The Problem
• The requirement that u is less than 0.1 is
actually quite difficult to attain for light atom
structures (C,H,N,O compounds).
• A precise determination of x requires large
values of f ” relative to f0+f ’.
• But even with Cu-radiation f ”(O) etc. are
small.
• Alanine C3H7NO2 x = -0.04(27) [100 K data]
The Problem
• If Friedel’s law held
exactly absolute
structure determination
by X-ray crystallography
would be impossible.
• But anomalous
scattering introduces
deviations which carry
absolute structure
information.
• Magnitudes depend on
– Elements present
– Wavelength of X-
rays used.
f” values Mo Cu
C 0.002 0.009
N 0.003 0.018
O 0.006 0.032
S 0.124 0.558
Friedif (Friedifstat)
Flack & Shmueli. Acta Cryst. (2007) A63, 257
Friedif = 104 χ
Friedif
Flack & Bernardinelli. Acta Cryst. (2008) A64, 484
Test Data Sets • Good crystals – 23 data sets
• Mostly 100 K data collection. High redundancy.
• Complete or nearly complete Friedel data
• Cu-Kα radiation. CCD Instruments
• Multiscan absorption correction.
• Merged in SORTAV (defaults)
• Refine against F2 with all data. Shelx weights.
• Spherical scattering factors – Dittrich et al. Acta Cryst (2006) A62, 217
• Refine H positions and Uiso freely.
• Data collected by Trixie Wagner (Novatis), Olly Presly (Agilent) or me
Parsons, Flack & Wagner Acta Cryst (2013) B69, 249
Test Data Sets
Formula Space Group Redundancy R1(F>4u(F))
C3H7NO2 P212121 25 2.19
C25H31NO5 P212121 15 2.48
C19H26N6O* P212121 14 4.25
C16H18N2 P32 13 2.83
C27H48 P21 18 2.88
*Some disorder
Friedif and refined x
Code Formula Friedif x (Normal Ref’t)
L-Alanine C3H7NO2 34 -0.04(27)
GKO02 C25H31NO5 32 0.01(15)
R-CYCLO C19H26N6O 21 -0.02(27)
TWA16A C16H18N2 13 0.00(69)
Cholestane C27H48 9 -0.01(77)
Expected distribution of 23
values of x/u
Refinement of x
Code x (Normal Ref’t)
L-Alanine -0.04(27)
GKO02 0.01(15)
R-CYCLO -0.02(27)
TWA16A 0.00(69)
Cholestane -0.01(77)
Distribution of x/u compared
to a unit Gaussian for 23
Structures
Chi2 = 0.03
Precise Absolute Structure
Determination
• Is there a way to get lower (more
realistic) standard uncertainties?
• Post refinement methods
– Probability the model hand is correct
– Estimate of x (2 methods)
• Obtaining x during refinement
– Restraints
Right or Wrong?
Bayesian Methods
(PLATON/BIJVOET)
y = Hooft parameter = x calculated by Bayesian methods
Hooft, Straver & Spek. J. Appl. Cryst. (2008), 41, 96.
Bayesian Methods
Calculate structure factors F2single = F2
c with x = 0.
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fz
u F
h h
h
21( ) exp
22
hh
zp z
(observations | x 0) ( )hp p z
Bayesian Methods
Calculate structure factors with x = 0.
Fc2 Fo2 Sigma(Fo2)
-6 2 1 68.55 65.70 0.71
6 2 1 67.61 64.50 0.71
2 2
(67.61 68.55) (64.50 65.70)
0.71 0.71z
Assumes measurement errors
are distributed with a Gaussian
pdf.
21( ) exp
22
hh
zp z
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fz
u F
h h
h
Bayesian Methods
Calculate structure factors F2single = F2
c with x = 0.
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fz
u F
h h
h
21( ) exp
22
hh
zp z
(observations | x 0) ( )hp p z
Bayesian Methods
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fq
u F
h h
h
21(q ) exp
22
hh
qp
(observations | x 1) (q )hp p
Bayesian Methods:
Right or Wrong Structure?
(obs | 0)
(obs | 0) (
(x 0)( 0 | obs)
(x 0) (obs | 1) x 1)
pp x
p
p x
p x p px
Bayesian Methods
Code Friedif x (Normal Ref’t) P2(true)
L-Alanine 34 -0.04(27) 1.000
GKO02 32 0.01(15) 1.000
R-CYCLO 21 -0.02(27) 1.000
TWA16A 13 0.00(69) 1.000
Cholestane 9 -0.01(77) 1.000
Calculation of x(u) by
Bayesian Methods • This analysis can be
extended to obtain a value of x (aka y).
• Instead of calculating probabilities at y = 0 and 1, calculate for the range 0-1, and build up a distribution.
• Equations expressed in terms of
γ (gamma) = 1-2y
Hooft, Straver & Spek. J. Appl. Cryst. (2008), 41, 96.
Hooft Parameter
Code Friedif x (Normal Ref’t) y
L-Alanine 34 -0.04(27) 0.01(4)
GKO02 32 0.01(15) 0.03(3)
R-CYCLO 21 -0.02(27) -0.02(4)
TWA16A 13 0.00(69) 0.02(7)
Cholestane 9 -0.01(77) -0.04(9)
Chi2 for 23 structures = 0.83 (~1)
When Errors are Non-
Gaussian (‘Poor Data’)
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fz
u F
h h
h
21( ) exp
22
hh
zp z
(observations | x 0) ( )hp p z
Hooft, Straver & Spek. J. Appl. Cryst. (2010), 43, 665
When Errors are Non-
Gaussian (‘Poor Data’)
2 2
single
2
( ) ( )
( ( ))
o
h
o
F Fz
u F
h h
h
(observations | x 0) ( )hp p z
Student-t
ν = 5
Hooft, Straver & Spek. J. Appl. Cryst. (2010), 43, 665
12 2
1
2( , ) 1
2
zp z
Example
Riebenspies & Bhuvanesh Acta Cryst. (2013), B69, 288
‘Quotient’ Methods
• Systematic errors like absorption may drown-out
anomalous differences.
• Measure Friedel opposites in such as way that
absorption errors are the same for both.
• Stoe - measure
I(h) at 2, , and
I(-h) at -2, -, and
• The quotient I(h)/I(-h) is free from absorption and
extinction errors. Also scale-free.
Le Page, Gabe & Gainsford. J. Appl. Cryst. (1990), 23, 406
Quotients
Parsons, Flack & Wagner Acta Cryst (2013) B69, 249
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )( ) ( )(1 2 )
( ) ( ) ( ) ( ) | ( ) | | ( ) |o o
o o
F FF FI Ix
I I F F F F
h hh hh h
h h h h h h
Quotients
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )(1 2 )
( ) ( ) | ( ) | | ( ) |o o
o o
F FF Fx
F F F F
h hh h
h h h h
This can be
calculated from
your data set.
This can be calculated
(Fc2 for a model refined
with x = 0).
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )(1 2 )
( ) ( ) | ( ) | | ( ) |o o
o o
F FF Fx
F F F F
h hh h
h h h h
single( ) (1 2x)Q ( )oQ h h
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )(1 2 )
( ) ( ) | ( ) | | ( ) |o o
o o
F FF Fx
F F F F
h hh h
h h h h
single( ) (1 2x)Q ( )oQ h h
y = mx
Fit graph to y = mx
Equate gradient m to (1-2x)
Solve for Flack:
Gradient = 0.893(50) = 1 – 2x
Flack = 0.05(3)
Implemented in XPREP and Shelxl-2013.
No TWIN/BASF instructions!
single( ) (1 2x)Q ( )oQ h h
Code Friedif x (Normal Ref’t) x(QUOT)
L-Alanine 34 -0.04(27) 0.01(4)
GKO02 32 0.01(15) 0.02(3)
R-CYCLO 21 -0.02(27) 0.00(4)
TWA16A 13 0.00(69) 0.18(8)
Cholestane 9 -0.01(77) -0.01(13)
Differences
2 2 2 2 2 2
single single
single
( ) ( ) (1 2 )(| ( ) | | ( ) | )
(1 2 ) ( )
o o
o
F F x F F
D x D
h h h h
h h
Parsons, Flack & Wagner Acta Cryst (2013) B69, 249
The Post-Refinement Problem
A potential problem with post-refinement methods is
that any correlations involving x are lost.
But…
These quantities can also be applied as restraints.
Need to code F2(h) etc. in terms of x’s, U’s, occs and
so on.
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )(1 2 )
( ) ( ) | ( ) | | ( ) |o o
o o
F FF Fx
F F F F
h hh h
h h h h
Facilities in Topas Academic 5
• A symbolic equation can be written for the quotient.
• Like a function or subroutine in Fortran
fn FPC(h, k, l)
{
return
2.31000*Exp( -20.84390*s2(h,k,l)) +
1.02000*Exp( -10.20750*s2(h,k,l)) +
1.58860*Exp( -0.56870*s2(h,k,l)) +
0.86500*Exp( -51.65120*s2(h,k,l)) +
( 0.23370);
}
prm !FPPC 0.00910
fn X1O1(h, k, l) = tpi*(h*xO1 + k*yO1 + l*zO1);
...
fn QUOT(h, k, l) = (1-2*ENANTIO)*( (
U1O1(h, k ,l)*(FPO(h,k,l)*Cos(X1O1(h, k, l)) - FPPO*Sin(X1O1(h, k, l)))
+
U2O1(h, k ,l)*(FPO(h,k,l)*Cos(X2O1(h, k, l)) - FPPO*Sin(X2O1(h, k, l)))
+
...
restraint = ( 0.01642 - QUOT( 5, 2, 3 ))/ 0.00978;
restraint = ( 0.02984 - QUOT( 4, 3, 2 ))/ 0.00968;
restraint = ( -0.02283 - QUOT( 3, 1, 9 ))/ 0.00924;
restraint = ( 0.04175 - QUOT( 1, 2, 8 ))/ 0.02252;
Refine with intensity data merged in the centrosymmetric Laue
group and Q (or D) applied as restraints.
Code Friedif x(QUOT)
Post Refine
x(QUOT)
Refine
L-Alanine 34 0.01(4) 0.01(3)
GKO02 32 0.02(3) 0.03(3)
R-CYCLO 21 0.00(4) -0.02(4)
TWA16A 13 0.18(8) 0.14(8)
Cholestane 9 -0.01(13) 0.00(11)
Summary Code x(TWIN) y(HOOFT) x(QUOT)
Post Refine
x(QUOT)
Refine
L-Alanine -0.04(27) 0.01(4) 0.01(4) 0.01(3)
GKO02 0.01(15) 0.03(3) 0.02(3) 0.03(3)
R-CYCLO -0.02(27) -0.02(4) 0.00(4) -0.02(4)
TWA16A 0.00(69) 0.02(7) 0.18(8) 0.14(8)
Cholestane -0.01(77) -0.04(9) -0.01(13) 0.00(11)
Validation – Alanine (34)
Cholestane (9)
Cholestane (9)
TWA16A
Outlier omission 0.18(8) 0.08(8)
Outlier Detection
• Remove Bijvoet pairs if Do(h) > 2Dc, max
• For quotient calculations remove data
where Fo2(h)/u(Fo
2(h)) and
Fc2(h)/u(Fo
2(h)) are < 3.
2 2 2 22 2single single
2 2 2 2 2 2
single single
| ( ) | | ( ) |( ) ( )(1 2 )
( ) ( ) | ( ) | | ( ) |o o
o o
F FF Fx
F F F F
h hh h
h h h h
Normal Probability Plots
Validation
Mandelic acid
Why might this work?
Conclusions • Several methods for obtaining precise absolute
structure determination for light-atom structures.
• Post refinement calculations are OK. But still work to
do on the effects of completeness.
• Validation is important.
• Still work to do on automatic detection of outliers.
• Transformation into sensitive and insensitive
components is important. Still work to do on the
reasons the methods work - or why L.S. doesn’t.
Acknowledgements
• Howard Flack (Geneva)
• Trixie Wagner (Novartis)
• Alan Coelho (Topas)
• Richard Cooper & David Watkin (Oxford)
• George Sheldrick (Göttingen)