30
Direct Use of Phase Information in Direct Use of Phase Information in Refmac Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Embed Size (px)

Citation preview

Page 1: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Direct Use of Phase Information in Direct Use of Phase Information in RefmacRefmac

Abingdon, 18.3.2008

University of Leiden

P. Skubák

Page 2: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD EXPERIMENT

PHASING and DENSITY MODIFICATION

REFINEMENT and MODEL BUILDING

|F|

|F+|, |F-|

|F| = ( |F+| + |F-| )

REFINEMENT WITHOUT PRIOR PHASE INFORMATIONREFINEMENT WITHOUT PRIOR PHASE INFORMATION

21

Page 3: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD EXPERIMENT

PHASING and DENSITY MODIFICATION

REFINEMENT and MODEL BUILDING

, Pe()|F|

|F+|, |F-|

REFINEMENT WITH INDIRECT PRIOR PHASE REFINEMENT WITH INDIRECT PRIOR PHASE INFORMATIONINFORMATION

Pe() = e A cos() + B sin() + C cos(2.) + D sin(2.) |F| = ( |F+| + |F-| )

21

Page 4: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD EXPERIMENT

PHASING and DENSITY MODIFICATION

REFINEMENT and MODEL BUILDING

, heavy atom model

|F+|, |F-|

REFINEMENT WITH DIRECT PRIOR PHASE REFINEMENT WITH DIRECT PRIOR PHASE INFORMATIONINFORMATION

|F+|, |F-|

|F| = ( |F+| + |F-| )21

Page 5: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Rice refinement target

P( |Fo|, o , |Fc|, c )

integration over all o

o

2

0 ccoo d ) , , , P(

||F||F

P( |Fo| , |Fc|, c )

P( |Fo| ; |Fc|, c )

division by P( |Fc|, c )

conditional probability distribution P( |Fo| ; |Fc|, c )

maximum likelihood refinement target with no prior phase information

Page 6: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

MLHL refinement target

P( |Fo|, o , |Fc|, c )

weighted integration over all o

P( |Fo| , |Fc|, c )

P( |Fo| ; |Fc|, c )

division by P( |Fc|, c )

conditional probability distribution P( |Fo| ; |Fc|, c )

maximum likelihood refinement target indirectly incorporating prior phase information

oe

2

0 ccoo d )(P . ) , , , P(

||F||F

Page 7: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

P(|Fo-|, |Fo

+| ; |Fc-|, c

-, |Fc+|, c

+ )

P(|Fo-|, |Fo

+|, |Fc-|, c

-, |Fc+|, c

+ )

SAD refinement target

P(|Fo-|, o

-, |Fo+|, o

+, |Fc-|, c

-, |Fc+|, c

+ )

integration over all o- , o

+

ooccccoooo ||F||F||F||F

dd ) , , , , , , , P( 2

0

2

0

division by P( |Fc-|, c

-, |Fc+|, c

+ )

maximum likelihood refinement target directly incorporating prior phase information

conditional probability distribution P( |Fo-|, |Fo

+| ; |Fc-|, c

-, |Fc+|, c

+ )

Page 8: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD distributionP( |Fo

+|, |Fo-| ; Ac, Bc, AHc,

BHc)(strong prior phase information)

Rice distributionP( |Fo| ; Ac, Bc)(no prior phase information)

Page 9: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD distribution(weak prior phase information)

Page 10: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD REFINEMENT TARGET USE IN REFMAC

iterated automated model building with SAD function refinement

substructure refinement and scaling

refinement of models in the final stages

Page 11: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

iterated automated model building with SAD function refinement

substructure refinement and scaling

refinement of models in the final stages

SAD REFINEMENT TARGET USE IN REFMAC

Page 12: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

automated model building programs do not support SAD target (yet), workarounds needed in order to test:

the heavy atoms parameters file inputed to model building program separately by a script which also calls Refmac with the extra keywords needed for SAD refinement

this workaround used in CRANK for ARP/wARP+Refmac_sad implementation

better integration of ARP/wARP with Refmac SAD is on the way

MODEL BUILDING WITH SAD REFINEMENT

Page 13: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Fraction of ARP/wARP built residues to total number of residues

resolution lower than 2.4 Å resolution higher than 2.4 Å

Page 14: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

iterated automated model building with SAD function refinement

substructure refinement and scaling

refinement of models in the final stages

SAD REFINEMENT TARGET USE IN REFMAC

Page 15: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD SUBSTRUCTURE REFINEMENT & SCALING IN REFMAC

(VERY PRELIMINARY RESULTS)

being tested on ~ 200 JSCG datasets using CRANK package with pipeline: Refmac5_sad for scaling, Solomon for DM and Refmac5_sad for model building

average phase error after refmac phasing 75.4 deg

70 runs finished, of which 22 with successful model building

similar results ( 67 runs finished of which 25 with successful model building ) achieved with the same pipeline using BP3 instead of Refmac5_sad for phasing

Page 16: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

iterated automated model building with SAD function refinement

substructure refinement and scaling

refinement of models in the final stages

SAD REFINEMENT TARGET USE IN REFMAC

Page 17: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD REFINEMENT – CLOSE TO FINAL MODEL

R R-freeRice 22.79 26.37 18.68SAD 23.09 26.94 16.2

R R-freeRice 24.91 31.98 29.19SAD 25.94 31.24 27.99

R R-freeRice 21.73 27.68 17.19SAD 22.29 27.21 17.1

Thionein (1.7Å) ph. err.

Transhydr.(2.4Å) ph. err.

Lysozyme(1.6Å) ph. err.

R R-freeRice 29.31 35.52 32.38SAD 29.76 35.08 32.04

R R-freeRice 19.14 25.77 29.26SAD 20.18 25.69 29.12

R R-freeRice 22.01 23.82 15.92SAD 22.31 24.03 15.96

Thioester.(1.8Å) ph. err.

AEP (2.55Å) ph. err.

Ferredoxin(0.9Å) ph. err.

Page 18: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD REFINEMENT – CLOSE TO FINAL MODEL

R R-freeRice 22.79 26.37 18.68SAD 23.09 26.94 16.2MLHL 23.9 27.18 16.51

R R-freeRice 24.91 31.98 29.19SAD 25.94 31.24 27.99MLHL 26.17 31.78 31.02

R R-freeRice 21.73 27.68 17.19SAD 22.29 27.21 17.1MLHL 22.03 27.34 17.26

Thionein (1.7Å) ph. err.

Transhydr.(2.4Å) ph. err.

Lysozyme(1.6Å) ph. err.

R R-freeRice 29.31 35.52 32.38SAD 29.76 35.08 32.04MLHL 29.53 34.99 32.07

R R-freeRice 19.14 25.77 29.26SAD 20.18 25.69 29.12MLHL 19.82 25.56 28.64

R R-freeRice 22.01 23.82 15.92SAD 22.31 24.03 15.96MLHL 22.18 23.81 16.31

Thioester.(1.8Å) ph. err.

AEP (2.55Å) ph. err.

Ferredoxin(0.9Å) ph. err.

Page 19: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SIRAS EXPERIMENTDIRECT USE OF PRIOR PHASES

SIRAS X-RAY EXPERIMENT

PHASING and DENSITY MODIFICATION

REFINEMENT and MODEL BUILDING

substructure model

|FN|, |FD+|,|FD

-|

|FN|, |FD+|,|FD

-|

P( |FoN|,|FoD-|, |FoD

+| ; |FcN|, cN,|FcD-|, cD

-, |FcD+|, cD

+ )

Page 20: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SIRAS IMPLEMENTATION REQUIREMENTS AND TODO

numerical approximations to the 3-dimensional SIRAS integral – done for the function and first derivatives evaluation

second derivatives of SIRAS function should be calculated and used in minimisation too

modelling of non-isomorphism:

more models in Refmac with restraints between them and their parts

Page 21: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Rice MLHL SIRAS total

31 12 76 384

508 523 521 572

GCN5P 111 112 110 116

238 235 236 245

175 183 185 192

26 91 147 246

GerE

Thioesterase

Elastase

Ribonuclease

Soxy

SIRAS VERY PRELIMINARY RESULTS

– number of protein residues correctly built :

– results from Refmac5D - not modeling non-isomorphism (sharing protein part for native and derivative model), heavy atom refinement outside of Refmac, only first derivatives etc

Page 22: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Plans for the coming months run and analyze massive JCSG tests on both Refmac SAD substructure refinement and scaling and protein model building with iterative Refmac SAD refinement

analyze the SAD target improvements for close to final models

better integration of SAD with model building programs

anisotropic ATP's refinement for SAD target

simultaneous refinement of occupancies and ATP's for all targets

more models in Refmac (input, output, refinement etc)

geometry restraints between more models

SIRAS target implementation and testing for substructure refinement and scaling and protein model building

target for joint refinement of protein and ligand P( |FoP|,|FoPL|; |FcP|, cP,|FcPL|, cPL )

Page 23: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák
Page 24: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák
Page 25: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

I. Original Refmac5 code files

II. Modified Refmac5 code files

III. Bridge code files – layer between Refmac5 and SAD function itself

IV. SAD function code files

Refmac5 code organisation

fortran

fortran

C/C++

C/C++

Page 26: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

SAD/SIRAS function implementation standalone C++ template class with double or single precision

general likelihood function for 1 or 2 observed structure factors and N model structure factors (includes a.o. SAD, SIR or Rice functions for both centric and acentric cases)

possibility to define arbitrary covariance matrices for different experiments/situations, with real or complex terms

calculation of functional value, 1. and 2. derivatives with regards to calculated structure factors and Luzzati D parameters

Gaussian integration over unknown observed phases

use of tabulated Sin, Cos, Exp and modified Bessel I0, I1 functions to increase the evaluation speed

use of LAPACK package for calculation of eigenvalues of covariance matrices

Page 27: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

I. Original Refmac5 code files

II. Modified Refmac5 code files

III. Bridge code files – layer between Refmac5 and SAD function itself

IV. SAD function code files

Refmac5 code organisation

fortran

fortran

C/C++

C/C++

Page 28: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Tasks performed by bridge layer

passing the calls and parameters between Refmac5 part and likelihood function part in both directions

place of instantiation and “life” of likelihood class

transformation of derivatives with regards to structure factors amplitudes and phases (polar coordinates) to derivates with regards to real and imaginary structure factore part (as used by Refmac5)

role in read/write of substructure files

checks of reasonability and/or correctness of some input and output likelihood function parameters

Page 29: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

I. Original Refmac5 code files

II. Modified Refmac5 code files

III. Bridge code files – layer between Refmac5 and SAD function itself

IV. SAD function code files

Refmac5 code organisation

fortran

fortran

C/C++

C/C++

Page 30: Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák

Tasks performed by modified Refmac5 files

input, output and availability in code of observed |F+|, |F-| columns (via standard CCP4 libraries to read and write mtz files)

input, output and availability in code of substructure parameters (standard pdb file format and new internally used refmac5 format for both input and output)

gathering and precomputation of all information required as input by SAD function

calling of SAD function passing all required input information(via bridge functions)

replacement and/or modification of all original Refmac5 subroutines requiring different treatment with SAD function

harvesting of input keywords specific for SAD refinement

all original tasks performed by these files