41
1 Towards a model for -1 frameshift sites Alain Denise 1,2 , Michaël Bekaert 1 , Laure Bidou 1 , Guillemette Duchateau-Nguyen 1 , Jean-Paul Forest 2 , Christine Froidevaux 2 , Isabelle Hatin 1 , Jean-Pierre Rousset 1 , Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie) 2 LRI (Laboratoire de Recherche en Informatique) Université Paris-Sud, Orsay

Towards a model for -1 frameshift sites

  • Upload
    tamber

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards a model for -1 frameshift sites. Alain Denise 1,2 , Michaël Bekaert 1 , Laure Bidou 1 , Guillemette Duchateau-Nguyen 1 , Jean-Paul Forest 2 , Christine Froidevaux 2 , Isabelle Hatin 1 , Jean-Pierre Rousset 1 , Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie) - PowerPoint PPT Presentation

Citation preview

Page 1: Towards a model for -1 frameshift sites

1

Towards a model for -1 frameshift sites

Alain Denise1,2, Michaël Bekaert1, Laure Bidou1, Guillemette Duchateau-Nguyen1,

Jean-Paul Forest2, Christine Froidevaux2,

Isabelle Hatin1, Jean-Pierre Rousset1, Michel Termier1

1 IGM (Institut de Génétique et Microbiologie)2 LRI (Laboratoire de Recherche en Informatique)

Université Paris-Sud, Orsay

Page 2: Towards a model for -1 frameshift sites

2

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

mRNA

Page 3: Towards a model for -1 frameshift sites

3

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The ribosome reads bases by triplets (or codons)from a START codon

ribosome

5’ 3’

Page 4: Towards a model for -1 frameshift sites

4

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The ribosome synthetizes one amino-acid per codon

5’ 3’

Page 5: Towards a model for -1 frameshift sites

5

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 6: Towards a model for -1 frameshift sites

6

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 7: Towards a model for -1 frameshift sites

7

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 8: Towards a model for -1 frameshift sites

8

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 9: Towards a model for -1 frameshift sites

9

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The synthesis goes on until a STOP codon is read

5’ 3’

1 mRNA gives 1 protein

Page 10: Towards a model for -1 frameshift sites

10

Experimental fact

• Some mRNAs encode two distinct proteins with same 5’ end

Page 11: Towards a model for -1 frameshift sites

11

Programmed -1 frameshifting

Non-deterministic event

ORF1a

START0 STOP0

0 phase

STOP-1

ORF1b -1 phase

usualtranslation

-1 frameshift

1 mRNA gives 2 distinct proteinswith accurate ratio

Page 12: Towards a model for -1 frameshift sites

12

Typical -1 frameshift site [Brierley, 1989]

NNX XXY YYZ

AUG P SP

S1

L1

S2

L2

L’1

Slippery sequence Secondary structure

5’

3’

Page 13: Towards a model for -1 frameshift sites

13

IBV frameshift site

UAU UUA AAC

AUG

S1

S2

Slippery sequence Pseudoknot

5’

3’

GGGUAC

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 14: Towards a model for -1 frameshift sites

15

Translation with frameshift

UAU UUA AAC GGG UAC

AUG

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 15: Towards a model for -1 frameshift sites

16

Translation with frameshift

UAU UUA AAC GGG UAC

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 16: Towards a model for -1 frameshift sites

17

Translation with frameshift

UAU UUA AAC GGG UAC

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

-1 shift

Page 17: Towards a model for -1 frameshift sites

18

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 18: Towards a model for -1 frameshift sites

19

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 19: Towards a model for -1 frameshift sites

20

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 20: Towards a model for -1 frameshift sites

21

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 21: Towards a model for -1 frameshift sites

22

Goals

To improve the known model for viral frameshift sites

To identify new frameshift sites in viral and non viral genomes

Page 22: Towards a model for -1 frameshift sites

23

Our approach

Biologicalsequences

Formalmodels

Predictiontools

In silicoand in vivo

validation

Applications toother genomes

representexplainpredict

Page 23: Towards a model for -1 frameshift sites

24

IBV frameshift site: spacer

5’

3’

GGGUAC

Page 24: Towards a model for -1 frameshift sites

25

Spacer consensus

HAST-1 UAC AAA

BEV UGU UG

EAV UGA GAG

HCV GAG UC

IBV GGG UAC

MHV GGG UU

TGEV GAG

RCNMV UAG GC

BWYV GGA GUG

PLRV GGG CAA

BLV UAA UAG A

FIV UGG AAG GC

HIV-1 GGG AAG AU

HTLV-2UCC UUA A

JSR UGG GUG A

MMTV gag-pro UUG UAA A

MMTV pro-pol UGA U

RSV UAG GGA

SRV-1 GGA CUG A

Consensus UGG UAG AGAA GUA

Page 25: Towards a model for -1 frameshift sites

26

Lab experiments

lacZ luc

-1 phase

pSV40 lacZ luc

0 phase

pSV40 FS signal

FS signal N

Test construct

Control construct

Expression reporter FS reporter

Page 26: Towards a model for -1 frameshift sites

27

Spacer: lab experiments

Spacer relative FS rate

wild-type IBV GGGUA 100U mutant UGGUA 100

A mutant AGGUA 55C mutant CGGUA 32CC mutant CCGUA 70CCU mutant CCUUA 49

Page 27: Towards a model for -1 frameshift sites

28

Refining the model: Machine learning

• To identify relevant properties that characterize FS sites

• Disjunctive learning: all sequences do not frameshift for the same reasons [Giedroc et al., 2000]

Page 28: Towards a model for -1 frameshift sites

29

Annotating data: spacer

5’

3’

GGGUAC

Page 29: Towards a model for -1 frameshift sites

30

Example of data: SP

• SP = GGGUAC

– number of A = 1; C = 1; G = 3; U = 1;

– % of A = 33; C = 33; G = 50; U = 33;

– first = G;

– last = C;

Page 30: Towards a model for -1 frameshift sites

31

Annotating data: stem 1

UGACGAUGGGG

GCUG AUACCCC

5’

3’

Page 31: Towards a model for -1 frameshift sites

32

Example of data: stem 1

• S1 =

– 5' side : GGGGUAGCAGU– 3' side : CCCCAUAGUCG

– stability : -20,7 kcal/mol

Page 32: Towards a model for -1 frameshift sites

33

Annotating data: full sequence

U UUA AAC

5’

3’

GGGUAC

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 33: Towards a model for -1 frameshift sites

34

Example of data : FS rate

FS rate = 22 %

Page 34: Towards a model for -1 frameshift sites

35

GloBo

Disjunctive learning algorithm

Suited to small amount of data

Won the PTE challenge on analogous data

Page 35: Towards a model for -1 frameshift sites

36

Example of rulesIf

SP length 5 and number of G in S1.5’ bottom half 3 and

number of G in S1.5’ 4 and %T in S2.5’ 30 and%G in S2.5’ 70

then FS rate 5%

If %G in S1.5' bottom half 80 and %C in L1 45

then FS rate 5%

If

SP length 5 and S1.3' length 6 and %C in S1.3' 45

then FS rate 5%

...

Page 36: Towards a model for -1 frameshift sites

37

Covering and prediction

If

SP length 5 and number of G in S1.5’ bottom half 3 and

number of G in S1.5’ 4 and %T in S2.5’ 30 and%G in S2.5’ 70

then FS rate 5%

Covering of examples : 70 %

Examples predicted in test set : 80 %

Page 37: Towards a model for -1 frameshift sites

38

Is R1relevant for frameshift ?

Stem 1 5’-side relative FS R1 rate

wild-type IBV GGGGU AUCAGU 100 yesmutant 1 GGUCG AUCAGU 41 yesmutant 2 GGGGU UCUACA 55 yes

mutant 3 GCUCG AUCAGU 36 nomutant 4 GCCCU AUCAGU 73 no

Page 38: Towards a model for -1 frameshift sites

39

Covering and prediction

If

SP length 5 and S1.3' length 6 and %C in S1.3' 45

then FS rate 5%

Covering of examples : 45 %

Examples predicted in test set : 40 %

Page 39: Towards a model for -1 frameshift sites

40

Conclusion

• Spacer:– correlation between primary sequence and

FS rate has been established– systematic experimentation going on

Page 40: Towards a model for -1 frameshift sites

41

Conclusion

Biologicalsequences

Formalmodels

Predictiontools

In silicoand in vivo

validation

Applications toother genomes

Page 41: Towards a model for -1 frameshift sites

58

SpacerVirus Sequence

HAST-I : U A C A A ABEV : U G U U GEAV : U G A G A GHCV : G A G U CIBV : G G G U A CMHV : G G G U UTGEV : G A GRCNMV : U A G G CBWYV : G G A G U GPLRV : G G G C A ABLV : U A A U A G AFIV : U G G A A G G CHIV-1 : G G G A A G A UHTLV-II : U C C U U A AJSR : U G G G U G AMMTV : U U G U A A AMMTV : U G A URSV : U A G G G ASRV-1 : G G A C U G A

Consensus : U G G U A G AG A A G U A