14
Identifying structural templates Identifying structural templates using alignments of designed using alignments of designed sequences sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 [email protected] du

Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 [email protected]

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Identifying structural templates using Identifying structural templates using alignments of designed sequencesalignments of designed sequences

Stefan M. LarsonPande GroupBiophysics ProgramDecember, 2002 [email protected]

Page 2: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Structure prediction & sequence spaceStructure prediction & sequence space

ASDJFHLKASDLFHASDFLHUHOUIQWEQWEONBLQWEROKJASDFPOIQWERUHOQWEORSADFLKJIJ

ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFGQWOIEGTXKNBVALHERTASDLFHIUWERHSDDFGHKBJDDURMWOFBMFERTJFGJDKEGORTMVIRGHRT

ASDJFHLKASDLFHTJYHASDFLHUHOUIQWEDFGHQWEONBLQWEROKJDGHJASDFPOIQWERUHODHGRQWEORSADFLKJIJGHFG

ASDJFHLKASDASDFLHUHOUIQWEONBLQWERASDFPOIQWERQWEORSADFLK

Page 3: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Multiple sequence alignments aid Multiple sequence alignments aid comparative protein modelingcomparative protein modeling

• 1 in 3 sequences are recognizably related to at least one protein structure.

• A significant fraction of the remaining 2/3 have solved structural homologues, but they are not recognized through sequence similarity searching techniques.

• Marti-Renom et al. (2000)

• Multiple sequence alignments greatly improve the efficacy and accuracy of almost all phase of comparative modeling.

• Venclovas (2001)

Page 4: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Computational protein designComputational protein design

Native structure

Iterative refinementNew sequence

Page 5: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Large scale sequence Large scale sequence generationgeneration

200,000Total sequences generated

4,000Processors available

80 daysTotal time of data collection

26,400Total backbone variants

264Total structures

“Reverse BLAST” study:

Page 6: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

““Reverse BLAST”: Reverse BLAST”: finding templates for finding templates for

comparative modelingcomparative modeling

Larson SM, Garg A, Desjarlais JR, Pande VS. (2003) Proteins: Structure, Function, and Genetics

Page 8: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Results: Sequence qualityResults: Sequence quality

1E-17

1E-16

1E-15

1E-14

1E-13

1E-12

1E-11

1E-10

1E-09

1E-08

1E-07

1E-06

1E-05

0.0001

0.001

0.01

0.1

1

10

0 25 50 75 100 125 150 175 200 225

Designed sequence profile (ranked by E-value)

E-v

alu

e o

f b

est

PD

B h

it

0

5

10

15

20

25

30

Ave

rag

e id

enti

ty t

o n

ativ

e se

qu

ence

(%

)

Page 9: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Method: “Reverse BLAST”Method: “Reverse BLAST”

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

BLAST E<0.01

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

THEHYPOTHETICALPROTEINSEQUENCEASDFASDFASDFAASDFASDFASDFASDFASDFASDFASDFASDFHWERHWIENCVASDFNWEFUWEF

Designed Sequences Hypothetical Proteins Structural Templates

Page 10: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Do the designed sequences help?Do the designed sequences help?

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

2 3 4 5 6 7 8 9 10

E-value threshold (-log(E))

hit

s w

ith

seq

uen

ce a

lig

nm

ent

: h

its

wit

ho

ut

0

20

40

60

80

100

120

140

160

Tota

l u

niq

ue

hit

s

Correctly identified structural templates

fold-increase in # of templates

fold-increase in # of genes

total hits

Page 11: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

0

5

10

15

20

25

30

35P

yroc

occu

s h

orik

osh

ii S

ulfo

lobu

s so

lfata

ricu

s T

herm

op

lasm

a a

cid

ophi

lum

T

herm

op

lasm

a vo

lca

niu

m

Tre

pone

ma

pal

lidum

H

elic

oba

cte

r p

ylo

ri 2

669

5

Hel

ico

bact

er

pyl

ori

J99

C

ampy

loba

cte

r je

jun

i M

yco

bact

eriu

m t

ube

rcul

osis

CD

C15

51

Myc

oba

cter

ium

tu

berc

ulos

is H

37R

v R

icke

ttsia

pro

wa

zeki

i C

hlam

ydop

hila

pne

um

iae

AR

39

Chl

amyd

oph

ila p

neu

mia

e C

WL0

29

Chl

amyd

oph

ila p

neu

mia

e J

138

M

yco

bact

eriu

m le

pra

e

Chl

amyd

ia m

urid

aru

m

Chl

amyd

ia tr

acho

ma

tis

Aqu

ifex

aeo

licus

M

yco

plas

ma

ge

nita

lium

M

yco

plas

ma

pn

eum

onia

e

Myc

opl

asm

a p

ulm

onis

S

tre

pto

cocc

us

pyo

gen

es

Mes

orh

izob

ium

loti

Met

han

oco

ccus

jann

asc

hii

Bor

relia

bur

gdo

rfe

ri D

eino

cucc

us

rad

iodu

ran

s U

reap

lasm

a u

real

ytic

um

H

alob

acte

rium

sp

C

aulo

bact

er c

resc

entu

s L

acto

cocc

us la

ctis

A

rcha

eog

lob

us fu

lgid

us

Pyr

ococ

cus

aby

ssi

Met

han

oba

cte

rium

the

rmo

auto

tro

phic

um

Nei

sser

ia m

en

ingi

tidis

MC

58

Nei

sser

ia m

en

ingi

tidis

Z2

491

H

aem

ophi

lus

influ

enza

e

Xyl

ella

fast

idio

sa

Buc

hne

ra s

p

Sta

phyl

ococ

cus

aur

eus

Mu5

0

Sta

phyl

ococ

cus

aur

eus

N31

5

Pas

teur

ella

mul

toci

da

The

rmo

toga

ma

ritim

a

Vib

rio

cho

lera

e B

acill

us s

ubtil

is

Pse

udo

mon

as

aeru

gin

osa

S

yne

choc

ystis

PC

C6

803

E

sche

richi

a co

li O

157

H7

ED

L933

E

sche

richi

a co

li O

157

H7

E

sche

richi

a co

li K

12

Genome searched

Nu

mb

er

of

str

uc

tura

l te

mp

late

s id

en

tifi

ed

Remote homology detectionRemote homology detection

Page 12: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Optimizing structural diversityOptimizing structural diversity

0

10

20

30

40

50

60

70

80

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

RMSD of structural ensemble (Angstroms)

(%)

0

1

2

3

4

5

6

Seq

uen

ce e

ntr

op

y

sequence entropy

prediction accuracy

prediction coverage

mean pairwise %ID

mean native %ID

Page 13: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

Future workFuture work

• Compare “reverse BLAST” to other remote homology detection approaches (3D-PSSM, HHMER, etc).

• Retrodict CASP targets, especially those which were not successfully predicted by comparative modeling.

• Increase the coverage and accuracy of the designed sequence sets.

Page 14: Identifying structural templates using alignments of designed sequences Stefan M. Larson Pande Group Biophysics Program December, 2002 smlarson@stanford.edu

CollaboratorsCollaborators

Stanford University• Amit Garg• Dr. Vijay Pande

Harvard University• Jeremy England

Xencor, Inc.• Dr. John Desjarlais