The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts

The coding poten-al of Pseudomonas aeruginosa: Compara-ve genomics and ribosome profiling

Luciano Brocchieri

Department of Molecular Gene-cs & Microbiology and Gene-cs Ins-tute University of Florida, Gainesville, FL 32610

Gene finding and GC content

Coding regions are characterized by typical 3-‐base periodicity in GC content, depending on the overall GC content of the sequence.

(Bibb et al. 1984; Borodovsky and McIninch 1993; Besemer et al. 2001)

GC in cod

on posi-on

3

2

1 Pseudomonas aeruginosa:

67.7% GC

Frame analysis (Bibb et al 1984)

GC content is measured every third nucleotide in three phases. Compositional contrasts among S-profiles of GC content indicate presence

and frame of coding regions.

ggtgtccgcgtcccagacgtaggcctcgagcgtcgcgccgtagagcagggccgccgggtg...

!"#$%&' g..g..c..g..c..g..g..g..c..g..c..c..g..g..g..c..g..c..c..g.....

!"#$%&( .g..t..g..t..c..a..t..g..t..a..g..g..c..t..a..a..g..g..g..t....

!"#$%&) ..t..c..c..c..a..c..a..c..c..g..t..c..c..a..g..g..c..c..g..g...

!"#&$*%&'(*%)'(**(*%+'+

"&"""""""""""""""""""""&,""""""""""""""""""""""),"""""""""""""""""""""""-,""""""""""""""""""""""".,""""""""""""""""""""""/,"""""""""""""""""""""""0,

0.0

50.0

100.0%

GC

0.0

50.0

100.0

% G

C

0.0

50.0

100.0

% G

C

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Annotation: dnaA dnaN recF gyrB lptA 0006 0007

0.0

50.0

100.0

% G

C

A

B

C

32 33 34 35 36 37 38 39 40

0

20

40

60

80

100

Annotation:

% G

C

PA0028

PA0029 PA0030 betC

PA0032

PA0033PA0034 trpA trpB

trpI

30 31

Genome position / Kbp

1,760

fabG ymfJ ymfM pgsA cinA recA pbpX ymdA ymdB spoVS

1,761 1,762 1,763 1,764 1,765 1,766 1,767 1,768 1,769 1,770

0

20

40

60

80

100

Annotation:

% G

CNC_002516 Pseudomonas aeruginosa PAO1 (%GC = 68.2)

NC_000964 Bacillus subtilis subtilis 168 (%GC = 43.1)

NC_000963 Rickettsia prowazekii Madrid E (%GC = 32.2)

214

sucA RP182 RP183 RP184 dnaK

215 216 217 218 219 220 221 222 223 224

0

20

40

60

80

100

Annotation:

% G

C

Frame analysis and sequence GC content

Relative representation of nucleotides at each codon position can be used as scores for nucleotide usage

Scores that depend on global nucleo-de composi-on are calculated for each trinucleo-de

-3

-2

-1

0

1

2

0.25 0.35 0.45 0.55 0.65GC

GWY

TGR

GCT

TGS

RWS

YSW

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

Codon base 1A1 C1 G1 T1

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

A2 C2 G2 T2

-1.0

-0.5

0

0.5

1.0

0.20 0.35 0.50 0.65 0.80

A3 C3 G3 T3

Codon base 2

Codon base 3Codons

GCGC

GC

Score

Score

Score(XYZ) = lnpX1

pY2pZ3

1− pstop(0)( )

pXpYpZ 1− pstop(1)( )

Cumulative scores and H-type hits

Cumula-ve scores can be used to translate qualita-ve visual informa-on on composi-onal contrasts (leX) into sta-s-cally characterized (Karlin and

Altschul 1990, Karlin 1994) measures of coding poten-al for precisely defined sequence segments (right).

Posi-on

Score

0742

Modified 0742

p = 10-‐3

Significant contrasts in characterized genes

!"#

!"$

!"%

!"&

!"'

!"(

!")

*"!

%!*!!*%!

(!!(%!)!!)%!

*!!!

'%!

'!!

&%!

+!!

#%!

#!!

&!!

%%!

%!!

$%!

+%!$!!

!"+

!"#"$%

!"+

!"+% !"$

!"$% !"%

!"%% !"&

!"&% !"'

!"'%

!"#% &!

!"*

!"! '()*+,"$-".-/0'%-1,+2-3-"(-&4+567-2,+%

25

30 35 40 45 50 55 60 65 70 75

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

5010015

020025

030035

040045

050055

060065

070075

080085

0900950

1000

% GC

Frac

tion

of c

hara

cter

ized

gen

es w

ith a

ny ty

pe h

its

p��-3

Length/codons

p ≤ 0.01 p ≤ 0.001

Power of the tests is high for all long sequences and decreases more with shorter sequences of low or intermediate GC content

ORF structure and local composition: The reading-frame prediction-unit

The sequence is read frame-‐by-‐frame independently considering sequence segments included between consecu-ve stop codons (i.e., analyzed segments do not contain stop codons).

The local composi-on of the each sequence segments is independently calculated from each selected reading frame and segment-‐specific scores are built based on its composi-on.

Reading frame

Local composition

Only hits included within ORFs are considered (i.e., a poten-al start codon must be iden-fied).

ORF structure

NPACT home web page (hbp://genome.ufl.edu/npact/)

!"#$%&'()*+%&+,*)-.%/%01*+1)$-2'0%*1+,3456+7%/#()8)+2)0%/)9

! "!!! #!!! $!!! %!!! &!!! '!!! (!!! )!!! *!!! "!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

;-0, ;-05 <6=> ?9<@

8A/, B,!!!'

B,!!!(

"!!!! ""!!! "#!!! "$!!! "%!!! "&!!! "'!!! "(!!! ")!!! "*!!! #!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;:+;<=

B,!!!(

?893 ?89C

/0? B,!!"" B,!!"# B,!!"$

B,!!"% B,!!"& /<D, B,!!"(

#!!!! #"!!! ##!!! #$!!! #%!!! #&!!! #'!!! #(!!! #)!!! #*!!! $!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

B,!!"( EF/ ;6E

B,!!#! B,!!#" B,!!##

G.<

H6F> 0<.I A8=@ B,!!#( B,!!#)

$!!!! $"!!! $#!!! $$!!! $%!!! $&!!! $'!!! $(!!! $)!!! $*!!! %!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

B,!!#)

B,!!#* B,!!$! J6/K

B,!!$#

B,!!$$B,!!$% /<A, /<A@

/<AL

>"?5@3

%!!!! %"!!! %#!!! %$!!! %%!!! %&!!! %'!!! %(!!! %)!!! %*!!! &!!!!!+!#!+!%!+!'!+!)!+!"!!+!

,--./0/1.-

2334

56789:0--./0/6;

:+;<=

/<ALB,!!$)B,!!$* B,!!%! B,!!%"

NPACT graphical output GC-profiles of Pseudomonas aeruginosa PAO1, complete genome.

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits

Newly identified ORFs

% G

C

dnaA dnaN recF gyrBlptA PA0006

PA0007

H-2*G

10,000 11,000 12,000 13,000 14,000 15,000 16,000 17,000 18,000 19,000 20,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits


% G

C

PA0007glyS glyQ

tag PA0011 PA0013trkA PA0017

H-17*G

PA0012

PA0014 PA0015

H-15-A

20,000 21,000 22,000 23,000 24,000 25,000 26,000 27,000 28,000 29,000 30,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits


% G

C

fmt defPA0020 PA0021 PA0022

qorhemF aroE plcB PA0027

H-25*a

PA0017

PA0028

30,000 31,000 32,000 33,000 34,000 35,000 36,000 37,000 38,000 39,000 40,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits


% G

C

PA0028PA0029 PA0030 betC

PA0032PA0034 trpA trpB

trpI

H-32*GH-35*G

H-39*A

PA0033

40,000 41,000 42,000 43,000 44,000 45,000 46,000 47,000 48,000 49,000 50,0000.0

20.040.060.080.0

100.0

Input file CDS

Hits


% G

C

Sequence position / nt

trpI PA0040 PA0041PA0038

PA0039

Newly-‐iden-fied ORFs in P. aeruginosa strains

Strain! ID! Genome lengh! GC%! Annotated

CDS!Newly-

identified ORFs!

PAO1! NC_002516! 6264404! 66.56! 5572! 179!UCBPP-PA14! NC_008463! 6537648! 66.29! 5892! 173!PA7! NC_009656! 6588339! 66.45! 6286! 189!LESB58! NC_011770! 6601757! 66.3! 5925! 258!M18! NC_017548! 6327754! 66.5! 5684! 161!NCGM2_S1! NC_017549! 6764661! 66.14! 6268! 250!DK2! NC_018080! 6402658! 66.27! 5883! 157!B136_33! NC_020912! 6421010! 66.42! 5828! 160!RP73! NC_021577! 6342034! 66.46! 5762! 187!

From 157 to 258 ORFs with significant (p < 0.001) composi-onal periodicity and not corresponding to annotated genes are iden-fied in different strains of

P. aeruginosa.

Conserva-on of newly-‐iden-fied ORFs between Pseudomonas aeruginosa strains

Strain! Cat! B136_33! DK2! LESB58! M18! NCGM2_S1! PA7! PAO1! RP73! PA14! Cons! Non Cons.!

B136_33! ORF!CDS!

!27!

66!43!

47!47!

49!44!

78!45!

22!46!

48!44!

59!43!

66!48! 130! 30!

DK2! ORF!CDS!

66!48!

!30!

60!54!

63!51!

57!49!

28!56!

61!52!

69!49!

55!59! 135! 22!

LESB58! ORF!CDS!

47!128!

60!130!

!51!

66!131!

65!121!

22!130!

108!79!

62!127!

59!120! 212! 46!

M18! ORF!CDS!

49!53!

63!49!

66!56!

!26!

56!48!

27!49!

62!44!

58!53!

62!54! 134! 27!

NCGM2_S1!

ORF!CDS!

78!79!

57!72!

65!57!

56!64!

!31!

30!79!

58!51!

55!71!

66!65! 175! 75!

PA7! ORF!CDS!

22!47!

28!49!

22!39!

27!45!

30!45!

!34!

27!45!

28!51!

32!49! 98! 91!

PAO1! ORF!CDS!

48!120!

61!121!

108!70!

62!115!

58!107!

27!113!

!32!

58!125!

68!104! 187! 11!

RP73! ORF!CDS!

59!91!

69!98!

62!96!

58!99!

55!93!

28!93!

58!101!

!49!

53!104! 171! 17!

PA14! ORF!CDS!

66!66!

55!64!

59!44!

40!55!

66!54!

32!60!

68!38!

53!65!

!32! 146! 27!

Homologs for the majority of newly iden-fied ORF are iden-fied among annotated genes or newly-‐iden-fied ORFs of other P. aeruginosa

strains.

Name! Phyla! Genera! Species! Strains! Tot_con! Not_con! Tot! Tot Strain!

PAO1! 59! 26! 21! 13! 119! 60! 179! 109!UCBPP-PA14! 59! 10! 17! 11! 97! 76! 173! 89!PA7! 67! 11! 15! 17! 110! 79! 189! 95!LESB58! 128! 19! 25! 10! 182! 76! 258! 174!M18! 41! 17! 23! 18! 99! 62! 161! 88!NCGM2_S1! 105! 15! 27! 12! 159! 91! 250! 146!DK2! 42! 16! 18! 11! 87! 70! 157! 75!B136_33! 34! 11! 13! 13! 71! 89! 160! 62!RP73! 84! 17! 11! 7! 119! 68! 187! 110!Total! 619! 142! 170! 112! 1043! 671! 1714! 948!

Long-‐range onserva-on of ORFs newly-‐iden-fied in Pseudomonas aeruginosa strains

Most newly iden-fied ORF conserved among P. aeruginosa strains are conserved over different phyla

Verifica-on of computa-onal gene predic-on by transcriptome analysis in Pseudomonas

aeruginosa PAO1: RNA-‐seq and Ribosome Footprin-ng

!"#$%## !"#%%##

#

"

$

"

$

&''()*)+,-

#.#

%#.#

!##.#

/0123

4(567(8')

9:);

<+=-!!!" !!!>

!"#$%&'(

!"#$%&& !"#%%&&

&

'

$

'

$

())*+,+-./

&0&

%&0&

!&&0&

12345

6*789*:)+

;<+=

>-?/@,=(

,AB(

!"#$%&'(

!"!#$"" !"!%$""

"

!

&

!

&

'(()*+*,-.

"/"

$"/"

0""/"

12345

6)789):(*

;<*=

>,?.0%@@ ABC' ADC'

!"#$%%"&

!""#### !""$###

#

!

%

!

%

&''()*)+,-

#.#

/#.#

$##.#

01234

5(678(9')

:;)<

=+>-

*,*!$$?

!"#$%&'(

!"!#$$$ !"!%$$$

$

!

&

!

&

'(()*+*,-.

$/$

#$/$

0$$/$

12345

6)789):(*

;<*=

>,?.

!&@$ !&@0

!"#$%$&'

!""#$$$ !""%$$$

$

&

'

&

'

())*+,+-./

$0$

1$0$

2$$0$

34567

8*9:;*<)+

=>+?

@-A/

!1#' !1#1

!"#$%%"&

!""#$"" !""%$""

"

&

!

&

!

'(()*+*,-.

"/"

$"/"

0""/"

12345

6)789):(*

;<*=

>,?.

(+@A B$%$

!"#$%$"&

!"!#""" !"!$"""

"

$

%

$

%

&''()*)+,-

"."

/"."

#""."

01234

5(678(9')

:;)<

=+>-

$?@! $?@%

!"#$%&'(

!"#$### !"#%###

#

&

'

&

'

())*+,+-./

#0#

$#0#

1##0#

23456

7*89:*;)+

<=+>

?-@/

!!#$ !!#%

!"#$$%"& !"#$$'(&

!"!"### !"!$###

#

%

!

%

!

&''()*)+,-

#.#

"#.#

/##.#

01234

5(678(9')

:;)<

=+>-(?@4

!#$A

!"#$%$&'

!"#"### !"#$### !"%####

#

&

'

&

'

())*+,+-./

#0#

1#0#

%##0#

23456

7*89:*;)+

<=+>

?-@/

!'#& !'#! !'#'

!"#$%%&'

!!"#$%% !!"!$%%

%

!

&

!

&

'(()*+*,-.

%/%

$%/%

#%%/%

01234

5)678)9(*

:;*<

=,>.

!%?$

!%?@!"#$#%&'

The P. aeruginosa transcriptome: RNA-‐seq

0

500

1000

1500

2000

2500

3000

3500

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Tota

l nu

mb

er

Classes of annotated genes

0

10

20

30

40

50

60

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

TRWDO�QXP

EHU�RI�QHZ

O\�LGHQWLÀHG�25)V

&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Fra

ction e

xpre

ssed


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

Fra

ctio

n e

xp

resse

d


Expression of predicted genes by length and conserva-on classes Published annota-on Newly iden-fied ORFs

ORFs with RNA-‐seq reads

What do we learn about gene predic-ons from transcrip-on in bacteria?

Unexpected paberns

33000 34000 35000 36000 37000 38000

0

2

4

2

4

Annotation

0.0

50.0

100.0

% C

+G

Log-count

Hits

New

betC

0032

0033 0034 trpA trpB

H-51*A

Contradictory paberns of expression of well defined protein coding genes

In the case of predic-on of H-‐443*A , sequence features appear to be more convincing than RNA expression data

What do we learn about gene predic-ons from transcrip-on in bacteria?

The problem of an-sense transcrip-on

347000 348000 349000

0

2

4

2

4

Annotation

0.0

50.0

100.0

% C

+G

Lo

g-co

un

tHits

New

0306

0307

H-443*A

Ribosome footprin-ng (Ingolia et al, Science 2009)

Ribosome stalling with translation-elongation inhibitor tetracycline

Cell lysis and digestion of unprotected RNA

3XULÀFDWLRQ�RI�ULERVRPH�footprints

cDNA library preparation for deep-sequencing and genome mapping

Schema-c representa-on of the ribosome footprin-ng procedure applied to P. aeruginosa

Ribosome footprint coverage in P. aeruginosa

3,110 3,111 3,112 3,113 3,114 3,115 3,116 3,117 3,118 3,119 3,120

0

2

4

2

4

Published

0.0

50.0

100.0

% C

+G

Log-

coun

t

New

2748 endA 2750 2751 2752 2753 2754 eco 2756 2757 2758 2759

H-3432-G H-3436*A H-3435*A H-3440*g H-3444*A

Genome position / Kbp

# of re

ads

Example of ribosome footprint coverage in P. aeruginosa PAO1 showing rela-on with S-‐profiles, annotated genes and

newly iden-fied ORFs.

Ribosome footprints of ini-a-on sites

The an-bio-c tetracycline inhibits transla-on-‐elonga-on stalling ac-vely-‐transla-ng ribosomes


However, tetracycline does not prevent more ribosomes to be recruited at the ini-a-on site.


The accumula-on of ribosomes will result in increased numbers of profile-‐reads corresponding to the ini-a-on site.

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

-30

-15 0 15 30 45 60 75 90 105

120

135

150

165

180

195

210

225

240

255

270

285

300

Rel

ativ

e co

vera

ge

Position from start of translation / nt

RNA-seq

Ribosome footprints

Ribosome footprint coverage by codon posi-on

Metagene analysis of ribosome-‐footprint coverage Coverage is averaged over all genes, rela-ve to the start of transla-on

Ribosome footprint coverage by codon posi-on: center of reads

Metagene analysis of coverage by read center + 2 nt Coverage is averaged over all genes, rela-ve to the start of transla-on

0

10

20

30

40

50

60

70

80

-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100

Rib

osom

e-fo

otpr

int c

over

age

Position relative to start of translation / nt

Transla-onal evidence by ribosome footprin-ng in P. aeruginosa

0

50

100

150

200

250

300

350

400

0 200 400 600 800 1000 1200 1400 1600

Cov

erag

e


groEL (547 aa) c4917124..4915481 Characterized

Ribosome-‐footprint read-‐count paberns iden-fy mRNA transla-on, transla-on ini-a-on site, and transla-on pausing.

Transla-onal evidence by ribosome footprin-ng in P. aeruginosa

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400 1600

Cov

erag

e R

FP2


RFP1

0

50

100

150

200

250

Cov

erag

e R

FP1

groEL (547 codons)

RFP2

Similar coverage paberns are observed in different biological replicates

Scoring RFP expression

“Strength” of evidence decreases for poorly translated mRNA.

0

1

2

3

4

5

6

0 100 200 300 400 500 600 700 800

Cov

erag

e

Position relative to predicted start of translation

2758 (295 aa) 3118296..3119183 Characterized

RFP control

Scoring RFP expression

“Strength” of the evidence of expression is measured by an “Expression Index”.

=C0 lnC0C1

C0

C1

: Count of RFP reads in codon posi-ons [-‐2,+3] / 5;

: Count of RFP reads in codon posi-ons [+9, len/2] / (len/2 -‐9);

Expression Index

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on

Coverage coding region

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erge

Sta

rt / C

odin

g re

gion


Long published genes Short published genes

Long newly iden-fied ORFs Short newly iden-fied ORFs

Expression of published and newly-‐predicted genes in Pseudomonas aeruginosa

* Coverage normalized by the number of posi-ons

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on


0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100 1000 10000

Cov

erag

e St

art /

Cod

ing

regi

on


0

500

1000

1500

2000

2500

3000

3500

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Tota

l num

be

r


0

10

20

30

40

50

60

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

TRWDO�QXP

EHU�RI�QHZ

O\�LGHQWLÀHG�25)V


Expression of predicted genes by length and conserva-on classes Published annota-on Newly iden-fied ORFs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Short

Characterized

Short

Hypothetical

Fra

ction e

xpre

ssed


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Long

Characterized

Long

Hypothetical

Long Non-

conserved

Short

Characterized

Short

Hypothetical

Short Non-

conserved

Fra

ctio

n e

xp

re

sse

d


ORFs with Expression Index ≥ 12.0

Iden-fica-on of transla-on-‐ini-a-on sites by ribosome-‐footprin-ng

0

20

40 60

80

100

120

140 160

180

0 100 200 300 400 500 600 700

Cov

erag

e


fliA (247 aa) 1584795..1585538 Characterized

RFP control

0 50

100 150 200

250 300

350 400

450

-300 -200 -100 0 100 200 300

Cov

erag

e


cheY (124 aa) 1585640..1586014 Characterized

RFP control

FliA, sigma factor of RNA polymerase for flagellin gene transcrip-on. CheY is involved in transmission of sensory signal to the flagellar motor.

Start of transla-on iden-fica-on by RFP read accumula-on

Annotated Newly iden.fied

Same start 0.850 0.778

Different start 0.150 0.222

Ribosome footprints confirm the predicted start of transla-on for 85% of the genes, and of 78% of the newly-‐iden-fied ORFS, among those with evidence of transla-on.

0

50

100

150

200

250

300

350

400

-100 0 100 200 300 400

Cov

erag

e


eco (156 aa) 3116654..3117124 Characterized

RFP control

Iden-fica-on of new genes by ribosome-‐footprint evidence

A new gene is found to be expressed 5’ of the gene eco for Eco-n, a protease inhibitor localized to the periplasmic space.

3,114,000 3,115,000 3,116,000 3,117,000 3,118,0000.0

20.040.060.080.0

100.0

Input file CDS:

Hits

Newly identified ORF:

% G

CPA2752

PA2753 ecoPA2756 PA2757

PA2754

H-3445*A

Sequence position / nt

Iden-fica-on of new genes by ribosome-‐footprint evidence

The newly-‐iden-fied translated ORF (red circle) corresponds to a region of weak composi-onal 3-‐base periodicity.

Ribosome footprint coverage in P. aeruginosa

Examples of RFP-‐based gene discovery in P. aeruginosa PAO1 showing rela-on with S-‐profiles and annotated genes.

2,058 2,059 2,060 2,061

0

2

4

2

4

Annotated:

0.0

50.0

100.0

% G

C

Log-

coun

t

PAO1_1888

Sequence position / Kbp

PAO1_1889 PAO1_18890

0

100 200 300 400 500 600

-150 -100 -50 0 50 100 150 200 250

Cove

rage


MHGP10 RFP1 control

MHGP10Identified by RFP:

Lab members

•  Steve Oden – Postdoctoral associate. Development of gene finding methods and soXware, gene content analysis in human and prokaryotes.

•  Eric Hernandez – Programmer.

•  Dr. Anna Picca– Postdoctoral associate. RNA-‐seq and ribosome profiling

•  Dr. Ying Zhang – Postdoctoral associate. RNA-‐seq

•  Dr. Shouguang Jin (Molecular Gene-cs and Microbiology). P. aeruginosa

•  Dr. Silvia Tornale| (Medicine). Transcrip-on and RNA.

Collaborators

•  Dr. Rolf Renne and MGM.

•  Dr. Jianhong Hu (Research Scien-st)

Sequencing facility and support

Thanks to

•  NIH R01 GM087485-‐01A2

•  MGM, Gene-cs Ins-tute, College of Medicine.

Funding

Documents

The$coding$poten-al$of$Pseudomonas*aeruginosa Comparave ... · Frame analysis (Bibb et al 1984) GC content is measured every third nucleotide in three phases. Compositional contrasts