Probabilistic Methods for Interpreting Electron-Density Maps

Frank DiMaio

University of Wisconsin – Madison Computer Sciences Department

dimaio@cs.wisc.edu

3D Protein Structure

backbonebackbonesidechainbackbonesidechainC-alpha

3D Protein Structure

ALALEU PRO VAL

… …

?? ?? ??

High-Throughput Structure Determination

Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design

Some proteins produce poor density maps Interpreting poor electron-density maps is very

(human) laborious I aim to automatically interpret

poor-quality electron-density maps

Electron-Density Map Interpretation

……

GIVEN: 3D electron-density map,(linear) amino-acid sequence

Electron-Density Map Interpretation

……

FIND: All-atom Protein Model

My focus

Density Map Resolution

Morris et al. (2003) Ioerger et al. (2002)Terwilliger (2003)

2.0Å 3.0Å 4.0Å1.0Å

Thesis Contributions

A probabilistic approach to protein-backbone tracingDiMaio et al., Intelligent Systems for Molecular Biology (2006)

Improved template matching in electron-density mapsDiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)

Creating all-atom protein models using particle filteringDiMaio et al. (under review)

Pictorial structures for atom-level molecular modelingDiMaio et al., Advances in Neural Information Processing Systems (2004)

Improving the efficiency of belief propagationDiMaio and Shavlik, IEEE International Conference on Data Mining (2006)

Iterative phase improvement in ACMI

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

ACMI Overview

5-mer Lookup

…SAW C VKFEKPADKNGKTE…

ProteinDB

ACMI searches map for each template independently Spherical-harmonic decomposition allows rapid search

of all template rotations

Spherical-Harmonic Decomposition

f (θ,φ)

5-mer Fast Rotation Search

pentapeptide fragmentfrom PDB (the “template”)

electron density map

calculated (expected)density in 5A sphere

map-region sampled in

spherical shells

template-density sampled in

spherical shells

sampled region ofdensity in 5A sphere

5-mer Fast Rotation Search

map-region sampled in

spherical shells

template-density sampled in

spherical shells

template spherical-harmonic coefficients

map-region spherical-harmonic coefficients

correlationcoefficientas functionof rotation

fast-rotation function

(Navaza 2006, Risbo 1996)

Convert Scores to Probabilities

correlation coefficientsover density map ti (ui)

scan density map for fragment

probability distribution

over density mapP(5-mer at ui | EDM)

Bayes’rule

ACMI Overview

Probabilistic Backbone Model Trace assigns a position and orientation

ui={xi, qi} to each amino acid i

The probability of a trace U = {ui} is

1( | ) ( | )NP P u u U EDM EDM

This full joint probability intractable to compute

Approximate using pairwise Markov field

Pairwise Markov-Field Model

Joint probabilities defined on a graph as product of vertex and edge potentials

AAs ( | )i i

EDMAAs ,

( , )ij i ji j

u u ( | )P U EDM

GLY LYS LEU SERALA

ACMI’s Backbone Model

Observational potentials tie the map to the model

LEU SERGLY LYSALA

GLY LYS LEU SERALA

ACMI’s Backbone Model

Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation

Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space

Backbone Model Potential

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

Constraints between adjacent amino acids

×),( jiadj uu ) |||| ( jix xxp ),( ji uup=

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

i j i j

u u u u u

otherwise1

if0),(

K||x||xuu ji

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

i j i j

u u u u u

Constraints between all other amino acid pairs

Pr(5mer ... at )i i

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

i j i j

u u u u u

Observational (“template-matching”) probabilities

Inferring Backbone Locations Want to find backbone layout that maximizes

AAs , AAs , AAs | | 1 | | 1

i j i j

u u u u u

Inferring Backbone Locations

Exact methods are intractable Use belief propagation (Pearl 1988)

to approximate marginal distributions

Want to find backbone layout that maximizes

, ku k i( | ) ( | )i ip u p EDM U EDM

AAs , AAs , AAs | | 1 | | 1

i j i j

u u u u u

Belief Propagation Example

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31ˆ ˆ

Belief Propagation Example

LYS31 LEU32

mLEU32→LYS31

pLEU32pLYS31ˆ ˆ

Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map

O(G2) computation for each message passed O(G log G) as Fourier-space multiplication

O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator

Improved implementation O(NG log G)

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map

O(G2) computation for each message passed O(G log G) as Fourier-space multiplication

O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator

Improved implementation O(NG log G)

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

To pass a message

( , )occ i ju u1ˆ ( )

( )ni j j

m u 1 ( )nj i im u

Occupancy Message Approximation

occupancyedge potential

product of incoming msgs to i except from j

To pass a message

1ˆ( ) ( , ) ( ) n ni occ i i i i

m u u u p u du

( , )occ i ju u1ˆ ( )

( )ni j j

m u 1 ( )nj i im u

occupancyedge potential

product of all incoming msgs to i

“Weak” potentials between nonadjacent amino acids lets us approximate

1 5 62 3 4

3ˆocc

1 5 62 3 4

Send outgoing occupancy message product to a central accumulator

ACC x ( )im x

1 5 62 3 4

Then, each node’s incoming message product is computed in constant time

3 3p̂

2m 3m 4m

2 3m 4 3m

BP Output

After some number of iterations, BP gives probability distributions over Cα locations

ALA LEU PRO VAL ARG… …

… … …

LEU LEUp x VAL VALp x

ACMI’s Backbone Trace

Independently choose Cα locations that maximize approximate marginal distribution

* ˆarg max ( )i

i i ix

Example: 1XRI

LOW0.1

0.9009Å RMSd93% complete

prob(AA at location) 3.3Å resolution density map39° mean phase error

Testset Density Maps (raw data)

Density-map resolution (Å)

1.0 2.0 3.0 4.0

Experimental Accuracy

α’s

ACMI ARP/wARP

TextalResolve

% backbone correctly placed% amino acids correctly identified

Experimental Accuracy on a Per-Protein Basis

’s lo

ARP/wARP % Cα’s located

Resolve % Cα’s located

Textal % Cα’s located

0 20 40 60 80 100

0 20 40 60 80 1000

0 20 40 60 80 100

ACMI Overview

Problems with ACMI

Biologists want location of all atoms All Cα’s lie on a discrete grid Maximum-marginal backbone model may be

physically unrealistic

Ignoring a lot of information Multiple models may better represent

conformational variation within crystal

Probability=0.4 Probability=0.35 Probability=0.25 Maximum-marginal structure

ACMI with Particle Filtering(ACMI-PF)

Idea: Represent protein using a set of static 3D all-atom protein models

Particle Filtering Overview (Doucet et al. 2000)

Given some Markov process x1:KX with observations y1:K Y

Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates

( ) ( )1: 1: 1: 1:

i iK K K K K

p x y wt x x

Particle Filtering Overview

Markov process gives recursive formulation

1: 1: 1 1: 1 1: 1| | | |k k k k k k k kp x y p y x p x x p x y

Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles

Recursive weight update,

( ) ( ) ( )1( ) ( )

1 ( ) ( )1

i i ik k k ki i

k k i ik k k

p y x p x xwt wt

q x x y

Particle Filtering for Protein Structures

Particle refers to one specific 3D layout of some subsequence of the protein

At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms

Alternate extending chain left and right

Alternate extending chain left and right An iteration alternately places

Cα position bk+1 given bk

All sidechain atoms sk given bk-1:k+1

bk bk+1

Key idea: Use the conditional distribution p(bk|bi

k-1,Map) to advance particle trajectories

Construct this conditional distribution from BP’s marginal distributions

bk bk+1

Algorithmplace “seeds” bk

i for each particle i=1…N

while amino-acids remainplace bk

i+1 / bj

i-1 given bj:k

i for each i=1…N

place ski given bk

i-1:k+1 for each i=1…N

optionally resample N particlesend while

bkbk-1 bk+1

… …

Backbone Step (for particle i )

(1) Sample L bk+1’s from bk-1–bk–bk+1

pseudoangle distribution

bkbk+1

place bki+1 given bk

i for each i=1…N

pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

(2) Weight each sample by its ACMI-computed approximate marginal

i for each i=1…N

bk+11…L

pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

(3) Select bk+1 with probability

proportional to sample weight

i for each i=1…N

bk+11…L

bk bk+1

1 1 11

k k k kwt p b wt

(4) Update particle weight as sum of sample weights

i for each i=1…N

Sidechain Step (for particle i )

place ski given bk

(1) Sample sk from a database of

sidechain conformations

ProteinData Bank

pk(EDM | s ) k 1

pk(EDM | s ) k 2

pk(EDM | s ) k 3

(2) For each sidechain conformation, compute probability of density map given the sidechain

place ski given bk

pk(EDM | s ) k 1 pk(EDM | s ) k

pk(EDM | s ) k 2

(3) Select sidechain conformation from this weighted distribution

place ski given bk

mk k k

wt p s wt

(4) Update particle weight as sum of sample weights

place ski given bk

Particle Resampling

wt = 0.1wt = 0.1

wt = 0.4wt = 0.4

wt = 0.3wt = 0.3

wt = 0.1wt = 0.1

wt = 0.2

wt = 0.1

wt = 0.4

wt = 0.3

wt = 0.1

Amino-Acid Sampling Order

Begin at some amino acid k with probability

ˆ( ) exp entropy ( )k kP k p b

At each step, move left to right with probability

ˆ( 1) exp entropy ( )

P j p b

P k p b

Experimental Methodology

Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5

Compare 10-structure model to others using Rfree

obs calc

ACMI-PF Versus ACMI-Naïve

Number of ACMI-PF runs

1 2 3 4 5 6 7 8 9 10

Acmi-PF

Acmi-Naive

Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)

ACMI-PF Versus OthersA

ARP/wARP Rfree Resolve Rfree Textal Rfree

0.25 0.35 0.45 0.55 0.65

ACMI-PF Example: 2A3Q

1.79Å RMSd92% complete

2.3Å resolution 66° phase err.

ACMI Overview

Phase 4: Iterative phase improvement Use particle-filtering models to

improve density-map quality Rerun entire pipeline on

improved density map Repeat until convergence

Phase Problem

, f I Φ

Intensities

Phases

Measured by X-raycrystallography

Experimentallyestimated (e.g. MAD, MIR)

Density-Map Phasing

30° 60° 75°0°

mean phase error

calcΦ

calcΦexpΦ

Iterative Phase Improvement

Predicted3D model

Initialdensity map

Reviseddensity map

ACMI-PF’s Phase Improvement

Error in initial phases(deg. mean phase error)

’s p

0 15 30 45 60 75

Two-Iteration ACMI

% backbone locatedIteration 1

50 60 70 80 90 100

Future Work: Many-iteration ACMI

0 1 2 3 40

1 2 3 4 5

Number of ACMI iterations Number of ACMI iterations

Conclusions

ACMI’s three steps construct a set of all-atom protein models from a density map

Novel message approximation allows inference on large, highly-connected models

Resulting protein models are more accurate than other methods

Ongoing and Future Work

Incorporate additional structural biology background knowledge

Incorporate more complex potential functions

Further work on iterative phase improvement

Generalize my algorithms to other 3D image data

Acknowledgements

Advisor Jude Shavlik Committee

George Phillips Charles Dyer David Page Mark Craven

Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman

6th floor MSCers

Center for Eukaryotic Structural Genomics

Funding UW-Madison Graduate

School NLM 1T15 LM007359 NLM 1R01 LM008796

Probabilistic Methods for Interpreting Electron-Density Maps

Documents

Predicting and Interpreting Electron Paramagnetic ...fajer/Fajerlab/LinkedDocuments/Graeme Hanson.pdf · Predicting and Interpreting Electron Paramagnetic Resonance Spectra. ... (for

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLINGMADE …€¦ · PROBABILISTIC PROGRAMMING “Probabilistic programming is to probabilistic modelling as deep learning is to neural networks”

Probabilistic Reasoning - Department of Computer Science ... · Probabilistic Description Logics Probabilistic Datalog+/– Probabilistic Ontological Data Exchange Description Logics:

On the Probabilistic Foundations of Probabilistic Roadmap ...robots.stanford.edu/isrr-papers/draft/hsu-latombe.pdf · On the Probabilistic Foundations of Probabilistic Roadmap Planning

Probabilistic programming and Stan · Probabilistic programming language Wikipedia “A probabilistic programming language (PPL) is a programming language designed to describe probabilistic

Probabilistic Robotics Probabilistic Motion Models

Interpreting Esxtop

Probabilistic vs. non-probabilistic approaches to the neurobiology … · 2016-06-26 · Probabilistic vs. non-probabilistic approaches to the neurobiology of perceptual decision-making

Learning probabilistic logic models from probabilistic examples

TR82-012 GR1P820908 A MAN-MACHINE tNTERFACE FOR by …tion for electron density maps and on the man-machine inter face. Interpreting an electron density map is a difficult pattern

(1) - Physics & Maths Tutorpmt.physicsandmathstutor.com/download/Biology/A-level/Topic-Qs/AQ… · (ii) Care must be taken in interpreting electron micrographs. ... the cell . Cytoplasm

Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department dimaio@cs.wisc.edu

Career Opportunities Interpreting-history Deaf Interpreting

Probabilistic mapping - Probabilistic Robotics DT4051 ...130.243.105.49/Research/Learning/courses/probrob/... · Probabilistic mapping Probabilistic Robotics DT4051, spring 2015 Martin

Probabilistic Design Introduction An Example Motivation Features Benefits Probabilistic Methods Probabilistic Results/Interpretation Summary

Probabilistic Record Linkages for Generating a ...health-info-solutions.com/Probabilistic Record Linkages...Probabilistic Record Linkages Page 1 of 105 Probabilistic Record Linkages

CH 908: Mass Spectrometry Lecture 2 Interpreting Electron Impact Mass Spectra Recommended: Read chapters 3-5 of McLafferty Prof. Peter B. O’Connor

Interpreting Bistability Using Probabilistic Inferencevision.psych.umn.edu/users/schrater/Papers/JOV17submitted.pdf · Interpreting Bistability Using Probabilistic Inference ... investigated

Interpreting ecg

Foundations of Probabilistic Programming/Program SynthesisOverview Foundations of Probabilistic Programming Probabilistic Programs Probabilistic programs extend sequential programs