37
The coding poten-al of Pseudomonas aeruginosa: Compara-ve genomics and ribosome profiling Luciano Brocchieri Department of Molecular Gene-cs & Microbiology and Gene-cs Ins-tute University of Florida, Gainesville, FL 32610

The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

The  coding  poten-al  of  Pseudomonas  aeruginosa:  Compara-ve  genomics  and  ribosome  profiling  

Luciano  Brocchieri  

Department  of  Molecular  Gene-cs  &  Microbiology  and  Gene-cs  Ins-tute  University  of  Florida,  Gainesville,  FL  32610  

Page 2: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Gene finding and GC content

Coding  regions  are  characterized  by  typical  3-­‐base  periodicity  in  GC  content,  depending  on  the  overall  GC  content  of  the  sequence.  

(Bibb  et  al.  1984;  Borodovsky  and  McIninch  1993;  Besemer  et  al.  2001)  

GC  in  cod

on  posi-on

 

3  

2  

1   Pseudomonas  aeruginosa:  

67.7%  GC  

Page 3: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Frame analysis (Bibb et al 1984)

GC content is measured every third nucleotide in three phases. Compositional contrasts among S-profiles of GC content indicate presence

and frame of coding regions.

ggtgtccgcgtcccagacgtaggcctcgagcgtcgcgccgtagagcagggccgccgggtg...

!"#$%&' g..g..c..g..c..g..g..g..c..g..c..c..g..g..g..c..g..c..c..g.....

!"#$%&( .g..t..g..t..c..a..t..g..t..a..g..g..c..t..a..a..g..g..g..t....

!"#$%&) ..t..c..c..c..a..c..a..c..c..g..t..c..c..a..g..g..c..c..g..g...

!"#&$*%&'(*%)'(**(*%+'+

"&"""""""""""""""""""""&,""""""""""""""""""""""),"""""""""""""""""""""""-,""""""""""""""""""""""".,""""""""""""""""""""""/,"""""""""""""""""""""""0,

0.0

50.0

100.0%

GC

0.0

50.0

100.0

% G

C

0.0

50.0

100.0

% G

C

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Annotation: dnaA dnaN recF gyrB lptA 0006 0007

0.0

50.0

100.0

% G

C

A

B

C

Page 4: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

32 33 34 35 36 37 38 39 40

0

20

40

60

80

100

Annotation:

% G

C

PA0028

PA0029 PA0030 betC

PA0032

PA0033PA0034 trpA trpB

trpI

30 31

Genome position / Kbp

1,760

fabG ymfJ ymfM pgsA cinA recA pbpX ymdA ymdB spoVS

1,761 1,762 1,763 1,764 1,765 1,766 1,767 1,768 1,769 1,770

0

20

40

60

80

100

Annotation:

% G

CNC_002516 Pseudomonas aeruginosa PAO1 (%GC = 68.2)

NC_000964 Bacillus subtilis subtilis 168 (%GC = 43.1)

NC_000963 Rickettsia prowazekii Madrid E (%GC = 32.2)

214

sucA RP182 RP183 RP184 dnaK

215 216 217 218 219 220 221 222 223 224

0

20

40

60

80

100

Annotation:

% G

C

Frame analysis and sequence GC content

Page 5: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Relative representation of nucleotides at each codon position can be used as scores for nucleotide usage

Scores  that  depend  on  global  nucleo-de  composi-on    are  calculated  for  each  trinucleo-de  

-3

-2

-1

0

1

2

0.25 0.35 0.45 0.55 0.65GC

GWY

TGR

GCT

TGS

RWS

YSW

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

Codon base 1A1 C1 G1 T1

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

A2 C2 G2 T2

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

A3 C3 G3 T3

Codon base 2

Codon base 3Codons

GCGC

GC

Score

Score

Score(XYZ) = lnpX1

pY2pZ3

1− pstop(0)( )

pXpYpZ 1− pstop(1)( )

Page 6: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Cumulative scores and H-type hits

Cumula-ve  scores  can  be  used  to  translate  qualita-ve  visual  informa-on  on  composi-onal  contrasts  (leX)  into  sta-s-cally  characterized  (Karlin  and  

Altschul  1990,  Karlin  1994)  measures  of  coding  poten-al  for  precisely  defined  sequence  segments  (right).  

Posi-on  

Score  

0742  

Modified  0742  

p  =  10-­‐3  

Page 7: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Significant  contrasts  in  characterized  genes  

!"#

!"$

!"%

!"&

!"'

!"(

!")

*"!

%!*!!*%!

(!!(%!)!!)%!

*!!!

'%!

'!!

&%!

+!!

#%!

#!!

&!!

%%!

%!!

$%!

+%!$!!

!"+

!"#"$%

!"+

!"+% !"$

!"$% !"%

!"%% !"&

!"&% !"'

!"'%

!"#% &!

!"*

!"! '()*+,"$-".-/0'%-1,+2-3-"(-&4+567-2,+%

25

30 35 40 45 50 55 60 65 70 75

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

5010015

020025

030035

040045

050055

060065

070075

080085

0900950

1000

% GC

Frac

tion

of c

hara

cter

ized

gen

es w

ith a

ny ty

pe h

its

p�����-3

Length/codons

p  ≤  0.01   p  ≤  0.001  

Power  of  the  tests  is  high  for  all  long  sequences  and  decreases  more  with  shorter  sequences  of  low  or  intermediate  GC  content  

Page 8: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

ORF structure and local composition: The reading-frame prediction-unit

The  sequence  is  read  frame-­‐by-­‐frame  independently  considering  sequence   segments   included   between   consecu-ve   stop   codons  (i.e.,  analyzed  segments  do  not  contain  stop  codons).  

The   local   composi-on   of   the   each   sequence   segments   is  independently  calculated  from  each  selected  reading  frame  and  segment-­‐specific  scores  are  built  based  on  its  composi-on.  

Reading frame

Local composition

Only   hits   included  within  ORFs   are   considered   (i.e.,   a   poten-al  start  codon  must  be  iden-fied).  

ORF structure

Page 9: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

NPACT  home  web  page  (hbp://genome.ufl.edu/npact/)  

Page 10: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

!"#$%&'()*+%&+,*)-.%/%01*+1)$-2'0%*1+,3456+7%/#()8)+2)0%/)9

! "!!! #!!! $!!! %!!! &!!! '!!! (!!! )!!! *!!! "!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

;-0, ;-05 <6=> ?9<@

8A/, B,!!!'

B,!!!(

"!!!! ""!!! "#!!! "$!!! "%!!! "&!!! "'!!! "(!!! ")!!! "*!!! #!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;:+;<=

B,!!!(

?893 ?89C

/0? B,!!"" B,!!"# B,!!"$

B,!!"% B,!!"& /<D, B,!!"(

#!!!! #"!!! ##!!! #$!!! #%!!! #&!!! #'!!! #(!!! #)!!! #*!!! $!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

B,!!"( EF/ ;6E

B,!!#! B,!!#" B,!!##

G.<

H6F> 0<.I A8=@ B,!!#( B,!!#)

$!!!! $"!!! $#!!! $$!!! $%!!! $&!!! $'!!! $(!!! $)!!! $*!!! %!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

B,!!#)

B,!!#* B,!!$! J6/K

B,!!$#

B,!!$$B,!!$% /<A, /<A@

/<AL

>"?5@3

%!!!! %"!!! %#!!! %$!!! %%!!! %&!!! %'!!! %(!!! %)!!! %*!!! &!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

/<ALB,!!$)B,!!$* B,!!%! B,!!%"

NPACT  graphical  output  GC-profiles of Pseudomonas aeruginosa PAO1, complete genome.

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

dnaA dnaN recF gyrBlptA PA0006

PA0007

H-2*G

10,000 11,000 12,000 13,000 14,000 15,000 16,000 17,000 18,000 19,000 20,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

PA0007glyS glyQ

tag PA0011 PA0013trkA PA0017

H-17*G

PA0012

PA0014 PA0015

H-15-A

20,000 21,000 22,000 23,000 24,000 25,000 26,000 27,000 28,000 29,000 30,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

fmt defPA0020 PA0021 PA0022

qorhemF aroE plcB PA0027

H-25*a

PA0017

PA0028

30,000 31,000 32,000 33,000 34,000 35,000 36,000 37,000 38,000 39,000 40,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

PA0028PA0029 PA0030 betC

PA0032PA0034 trpA trpB

trpI

H-32*GH-35*G

H-39*A

PA0033

40,000 41,000 42,000 43,000 44,000 45,000 46,000 47,000 48,000 49,000 50,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

Sequence position / nt

trpI PA0040 PA0041PA0038

PA0039

Page 11: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Newly-­‐iden-fied  ORFs  in  P.  aeruginosa  strains  

Strain! ID! Genome lengh! GC%! Annotated

CDS!Newly-

identified ORFs!

PAO1! NC_002516! 6264404! 66.56! 5572! 179!UCBPP-PA14! NC_008463! 6537648! 66.29! 5892! 173!PA7! NC_009656! 6588339! 66.45! 6286! 189!LESB58! NC_011770! 6601757! 66.3! 5925! 258!M18! NC_017548! 6327754! 66.5! 5684! 161!NCGM2_S1! NC_017549! 6764661! 66.14! 6268! 250!DK2! NC_018080! 6402658! 66.27! 5883! 157!B136_33! NC_020912! 6421010! 66.42! 5828! 160!RP73! NC_021577! 6342034! 66.46! 5762! 187!

From  157  to  258  ORFs  with  significant  (p  <  0.001)  composi-onal  periodicity  and  not  corresponding  to  annotated  genes  are  iden-fied  in  different  strains  of  

P.  aeruginosa.  

Page 12: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Conserva-on  of  newly-­‐iden-fied      ORFs  between  Pseudomonas  aeruginosa  strains  

Strain! Cat! B136_33! DK2! LESB58! M18! NCGM2_S1! PA7! PAO1! RP73! PA14! Cons! Non Cons.!

B136_33! ORF!CDS!

 !27!

66!43!

47!47!

49!44!

78!45!

22!46!

48!44!

59!43!

66!48! 130! 30!

DK2! ORF!CDS!

66!48!

 !30!

60!54!

63!51!

57!49!

28!56!

61!52!

69!49!

55!59! 135! 22!

LESB58! ORF!CDS!

47!128!

60!130!

 !51!

66!131!

65!121!

22!130!

108!79!

62!127!

59!120! 212! 46!

M18! ORF!CDS!

49!53!

63!49!

66!56!

 !26!

56!48!

27!49!

62!44!

58!53!

62!54! 134! 27!

NCGM2_S1!

ORF!CDS!

78!79!

57!72!

65!57!

56!64!

 !31!

30!79!

58!51!

55!71!

66!65! 175! 75!

PA7! ORF!CDS!

22!47!

28!49!

22!39!

27!45!

30!45!

 !34!

27!45!

28!51!

32!49! 98! 91!

PAO1! ORF!CDS!

48!120!

61!121!

108!70!

62!115!

58!107!

27!113!

 !32!

58!125!

68!104! 187! 11!

RP73! ORF!CDS!

59!91!

69!98!

62!96!

58!99!

55!93!

28!93!

58!101!

 !49!

53!104! 171! 17!

PA14! ORF!CDS!

66!66!

55!64!

59!44!

40!55!

66!54!

32!60!

68!38!

53!65!

 !32! 146! 27!

Homologs  for  the  majority  of  newly  iden-fied  ORF  are  iden-fied  among  annotated  genes  or  newly-­‐iden-fied  ORFs  of  other  P.  aeruginosa  

strains.  

Page 13: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Name! Phyla! Genera! Species! Strains! Tot_con! Not_con! Tot! Tot Strain!

PAO1! 59! 26! 21! 13! 119! 60! 179! 109!UCBPP-PA14! 59! 10! 17! 11! 97! 76! 173! 89!PA7! 67! 11! 15! 17! 110! 79! 189! 95!LESB58! 128! 19! 25! 10! 182! 76! 258! 174!M18! 41! 17! 23! 18! 99! 62! 161! 88!NCGM2_S1! 105! 15! 27! 12! 159! 91! 250! 146!DK2! 42! 16! 18! 11! 87! 70! 157! 75!B136_33! 34! 11! 13! 13! 71! 89! 160! 62!RP73! 84! 17! 11! 7! 119! 68! 187! 110!Total! 619! 142! 170! 112! 1043! 671! 1714! 948!

Long-­‐range  onserva-on  of  ORFs  newly-­‐iden-fied  in  Pseudomonas  aeruginosa  strains  

Most  newly  iden-fied  ORF  conserved  among  P.  aeruginosa  strains  are  conserved  over  different  phyla  

Page 14: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Verifica-on  of  computa-onal  gene  predic-on  by  transcriptome  analysis  in  Pseudomonas  

aeruginosa  PAO1:  RNA-­‐seq  and  Ribosome  Footprin-ng  

Page 15: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

!"#$%## !"#%%##

#

"

$

"

$

&''()*)+,-

#.#

%#.#

!##.#

/0123

4(567(8')

9:);

<+=-!!!" !!!>

!"#$%&'(

!"#$%&& !"#%%&&

&

'

$

'

$

())*+,+-./

&0&

%&0&

!&&0&

12345

6*789*:)+

;<+=

>-?/@,=(

,AB(

!"#$%&'(

!"!#$"" !"!%$""

"

!

&

!

&

'(()*+*,-.

"/"

$"/"

0""/"

12345

6)789):(*

;<*=

>,?.0%@@ ABC' ADC'

!"#$%%"&

!""#### !""$###

#

!

%

!

%

&''()*)+,-

#.#

/#.#

$##.#

01234

5(678(9')

:;)<

=+>-

*,*!$$?

!"#$%&'(

!"!#$$$ !"!%$$$

$

!

&

!

&

'(()*+*,-.

$/$

#$/$

0$$/$

12345

6)789):(*

;<*=

>,?.

!&@$ !&@0

!"#$%$&'

!""#$$$ !""%$$$

$

&

'

&

'

())*+,+-./

$0$

1$0$

2$$0$

34567

8*9:;*<)+

=>+?

@-A/

!1#' !1#1

!"#$%%"&

!""#$"" !""%$""

"

&

!

&

!

'(()*+*,-.

"/"

$"/"

0""/"

12345

6)789):(*

;<*=

>,?.

(+@A B$%$

!"#$%$"&

!"!#""" !"!$"""

"

$

%

$

%

&''()*)+,-

"."

/"."

#""."

01234

5(678(9')

:;)<

=+>-

$?@! $?@%

!"#$%&'(

!"#$### !"#%###

#

&

'

&

'

())*+,+-./

#0#

$#0#

1##0#

23456

7*89:*;)+

<=+>

?-@/

!!#$ !!#%

!"#$$%"& !"#$$'(&

!"!"### !"!$###

#

%

!

%

!

&''()*)+,-

#.#

"#.#

/##.#

01234

5(678(9')

:;)<

=+>-(?@4

!#$A

!"#$%$&'

!"#"### !"#$### !"%####

#

&

'

&

'

())*+,+-./

#0#

1#0#

%##0#

23456

7*89:*;)+

<=+>

?-@/

!'#& !'#! !'#'

!"#$%%&'

!!"#$%% !!"!$%%

%

!

&

!

&

'(()*+*,-.

%/%

$%/%

#%%/%

01234

5)678)9(*

:;*<

=,>.

!%?$

!%?@!"#$#%&'

The  P.  aeruginosa  transcriptome:  RNA-­‐seq  

Page 16: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

0

500

1000

1500

2000

2500

3000

3500

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Tota

l nu

mb

er

Classes of annotated genes

0

10

20

30

40

50

60

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

TRWDO�QXP

EHU�RI�QHZ

O\�LGHQWLÀHG�25)V

&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Fra

ction e

xpre

ssed

Classes of annotated genes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

Fra

ctio

n e

xp

resse

d

&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V

Expression  of  predicted  genes  by  length  and  conserva-on  classes  Published  annota-on   Newly  iden-fied  ORFs  

ORFs  with  RNA-­‐seq  reads  

Page 17: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

What  do  we  learn  about  gene  predic-ons  from  transcrip-on  in  bacteria?  

Unexpected  paberns  

33000 34000 35000 36000 37000 38000

0

2

4

2

4

Annotation

0.0

50.0

100.0

% C

+G

Log-count

Hits

New

betC

0032

0033 0034 trpA trpB

H-51*A

Contradictory  paberns  of  expression  of  well  defined  protein  coding  genes  

Page 18: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

In  the  case  of  predic-on  of  H-­‐443*A  ,  sequence  features  appear  to  be  more  convincing  than  RNA  expression  data    

What  do  we  learn  about  gene  predic-ons  from  transcrip-on  in  bacteria?  

The  problem  of  an-sense  transcrip-on    

347000 348000 349000

0

2

4

2

4

Annotation

0.0

50.0

100.0

% C

+G

Lo

g-co

un

tHits

New

0306

0307

H-443*A

Page 19: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprin-ng  (Ingolia  et  al,  Science  2009)  

Ribosome stalling with translation-elongation inhibitor tetracycline

Cell lysis and digestion of unprotected RNA

3XULÀFDWLRQ�RI�ULERVRPH�footprints

cDNA library preparation for deep-sequencing and genome mapping

Schema-c  representa-on  of  the  ribosome  footprin-ng  procedure  applied  to  P.  aeruginosa  

Page 20: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprint  coverage  in  P.  aeruginosa  

3,110 3,111 3,112 3,113 3,114 3,115 3,116 3,117 3,118 3,119 3,120

0

2

4

2

4

Published

0.0

50.0

100.0

% C

+G

Log-

coun

t

New

2748 endA 2750 2751 2752 2753 2754 eco 2756 2757 2758 2759

H-3432-G H-3436*A H-3435*A H-3440*g H-3444*A

Genome position / Kbp

#  of  re

ads  

Example  of  ribosome  footprint  coverage  in  P.  aeruginosa  PAO1  showing  rela-on  with  S-­‐profiles,  annotated  genes  and  

newly  iden-fied  ORFs.  

Page 21: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprints  of  ini-a-on  sites  

The  an-bio-c  tetracycline  inhibits  transla-on-­‐elonga-on  stalling  ac-vely-­‐transla-ng  ribosomes  

Page 22: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprints  of  ini-a-on  sites  

However,  tetracycline  does  not  prevent  more  ribosomes  to  be  recruited  at  the  ini-a-on  site.  

Page 23: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprints  of  ini-a-on  sites  

The  accumula-on  of  ribosomes  will  result  in  increased  numbers  of  profile-­‐reads  corresponding  to  the  ini-a-on  site.  

Page 24: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

-30

-15 0 15 30 45 60 75 90 105

120

135

150

165

180

195

210

225

240

255

270

285

300

Rel

ativ

e co

vera

ge

Position from start of translation / nt

RNA-seq

Ribosome footprints

Ribosome  footprint  coverage  by  codon  posi-on  

Metagene  analysis  of  ribosome-­‐footprint  coverage    Coverage  is  averaged  over  all  genes,  rela-ve  to  the  start  of  transla-on  

Page 25: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprint  coverage  by  codon  posi-on:  center  of  reads  

Metagene  analysis  of  coverage  by  read  center  +  2  nt    Coverage  is  averaged  over  all  genes,  rela-ve  to  the  start  of  transla-on  

0

10

20

30

40

50

60

70

80

-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100

Rib

osom

e-fo

otpr

int c

over

age

Position relative to start of translation / nt

Page 26: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Transla-onal  evidence  by  ribosome  footprin-ng  in  P.  aeruginosa  

0

50

100

150

200

250

300

350

400

0 200 400 600 800 1000 1200 1400 1600

Cov

erag

e

Position from start of translation / nt

groEL (547 aa) c4917124..4915481 Characterized

Ribosome-­‐footprint  read-­‐count  paberns  iden-fy  mRNA  transla-on,  transla-on  ini-a-on  site,  and  transla-on  pausing.  

Page 27: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Transla-onal  evidence  by  ribosome  footprin-ng  in  P.  aeruginosa  

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400 1600

Cov

erag

e R

FP2

Position from start of translation / nt

RFP1

0

50

100

150

200

250

Cov

erag

e R

FP1

groEL (547 codons)

RFP2

Similar  coverage  paberns  are  observed  in  different  biological  replicates  

Page 28: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Scoring  RFP  expression  

“Strength”  of  evidence  decreases  for  poorly  translated  mRNA.  

0

1

2

3

4

5

6

0 100 200 300 400 500 600 700 800

Cov

erag

e

Position relative to predicted start of translation

2758 (295 aa) 3118296..3119183 Characterized

RFP control

Page 29: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Scoring  RFP  expression  

“Strength”  of  the  evidence  of  expression  is  measured  by  an  “Expression  Index”.  

=C0 lnC0C1

C0

C1

:  Count  of  RFP  reads  in  codon  posi-ons  [-­‐2,+3]  /  5;  

:  Count  of  RFP  reads  in  codon  posi-ons  [+9,  len/2]  /  (len/2  -­‐9);  

Expression  Index  

Page 30: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on

Coverage coding region

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erge

Sta

rt / C

odin

g re

gion

Coverage coding region

Long  published  genes   Short  published  genes  

Long  newly  iden-fied  ORFs   Short  newly  iden-fied  ORFs  

Expression  of  published  and  newly-­‐predicted  genes  in  Pseudomonas  aeruginosa  

*  Coverage  normalized  by  the  number  of  posi-ons  

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on

Coverage coding region

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on

Coverage coding region

Page 31: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

0

500

1000

1500

2000

2500

3000

3500

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Tota

l num

be

r

Classes of annotated genes

0

10

20

30

40

50

60

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

TRWDO�QXP

EHU�RI�QHZ

O\�LGHQWLÀHG�25)V

&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V

Expression  of  predicted  genes  by  length  and  conserva-on  classes  Published  annota-on   Newly  iden-fied  ORFs  

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Fra

ction e

xpre

ssed

Classes of annotated genes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

Fra

ctio

n e

xp

re

sse

d

&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V

ORFs  with  Expression  Index  ≥  12.0  

Page 32: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Iden-fica-on  of  transla-on-­‐ini-a-on  sites  by  ribosome-­‐footprin-ng  

0

20

40 60

80

100

120

140 160

180

0 100 200 300 400 500 600 700

Cov

erag

e

Position relative to predicted start of translation

fliA (247 aa) 1584795..1585538 Characterized

RFP control

0 50

100 150 200

250 300

350 400

450

-300 -200 -100 0 100 200 300

Cov

erag

e

Position relative to predicted start of translation

cheY (124 aa) 1585640..1586014 Characterized

RFP control

FliA,  sigma  factor  of  RNA  polymerase  for  flagellin  gene  transcrip-on.  CheY  is  involved  in  transmission  of  sensory  signal  to  the  flagellar  motor.  

Page 33: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Start  of  transla-on  iden-fica-on  by  RFP  read  accumula-on  

Annotated   Newly  iden.fied  

Same  start   0.850   0.778  

Different  start   0.150   0.222  

Ribosome  footprints  confirm  the  predicted  start  of  transla-on  for  85%  of  the  genes,  and  of  78%  of  the  newly-­‐iden-fied  ORFS,  among  those  with  evidence  of  transla-on.  

Page 34: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

0

50

100

150

200

250

300

350

400

-100 0 100 200 300 400

Cov

erag

e

Position relative to predicted start of translation

eco (156 aa) 3116654..3117124 Characterized

RFP control

Iden-fica-on  of  new  genes  by  ribosome-­‐footprint  evidence  

A  new  gene  is  found  to  be  expressed  5’  of  the  gene  eco  for  Eco-n,  a  protease  inhibitor  localized  to  the  periplasmic  space.  

Page 35: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

3,114,000 3,115,000 3,116,000 3,117,000 3,118,0000.0

20.040.060.080.0

100.0

Input file CDS:

Hits

Newly identified ORF:

% G

CPA2752

PA2753 ecoPA2756 PA2757

PA2754

H-3445*A

Sequence position / nt

Iden-fica-on  of  new  genes  by  ribosome-­‐footprint  evidence  

The  newly-­‐iden-fied  translated  ORF  (red  circle)  corresponds  to  a  region  of  weak  composi-onal  3-­‐base  periodicity.  

Page 36: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Ribosome  footprint  coverage  in  P.  aeruginosa  

Examples  of  RFP-­‐based  gene  discovery  in  P.  aeruginosa  PAO1  showing  rela-on  with  S-­‐profiles  and  annotated  genes.  

2,058 2,059 2,060 2,061

0

2

4

2

4

Annotated:

0.0

50.0

100.0

% G

C

Log-

coun

t

PAO1_1888

Sequence position / Kbp

PAO1_1889 PAO1_18890

0

100 200 300 400 500 600

-150 -100 -50 0 50 100 150 200 250

Cove

rage

Position relative to predicted start of translation

MHGP10 RFP1 control

MHGP10Identified by RFP:

Page 37: The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

Lab  members    

•  Steve  Oden  –  Postdoctoral  associate.  Development  of  gene  finding  methods  and  soXware,  gene  content  analysis  in  human  and  prokaryotes.  

•  Eric  Hernandez  –  Programmer.  

•  Dr.  Anna  Picca–  Postdoctoral  associate.  RNA-­‐seq  and  ribosome  profiling  

•  Dr.  Ying  Zhang  –  Postdoctoral  associate.  RNA-­‐seq  

•  Dr.  Shouguang  Jin  (Molecular  Gene-cs  and  Microbiology).  P.  aeruginosa  

•  Dr.  Silvia  Tornale|  (Medicine).  Transcrip-on  and  RNA.  

Collaborators  

•  Dr.  Rolf  Renne  and  MGM.  

•  Dr.  Jianhong  Hu  (Research  Scien-st)  

Sequencing  facility  and    support  

Thanks  to  

•  NIH  R01  GM087485-­‐01A2  

•  MGM,  Gene-cs  Ins-tute,  College  of  Medicine.  

Funding