Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The coding poten-al of Pseudomonas aeruginosa: Compara-ve genomics and ribosome profiling
Luciano Brocchieri
Department of Molecular Gene-cs & Microbiology and Gene-cs Ins-tute University of Florida, Gainesville, FL 32610
Gene finding and GC content
Coding regions are characterized by typical 3-‐base periodicity in GC content, depending on the overall GC content of the sequence.
(Bibb et al. 1984; Borodovsky and McIninch 1993; Besemer et al. 2001)
GC in cod
on posi-on
3
2
1 Pseudomonas aeruginosa:
67.7% GC
Frame analysis (Bibb et al 1984)
GC content is measured every third nucleotide in three phases. Compositional contrasts among S-profiles of GC content indicate presence
and frame of coding regions.
ggtgtccgcgtcccagacgtaggcctcgagcgtcgcgccgtagagcagggccgccgggtg...
!"#$%&' g..g..c..g..c..g..g..g..c..g..c..c..g..g..g..c..g..c..c..g.....
!"#$%&( .g..t..g..t..c..a..t..g..t..a..g..g..c..t..a..a..g..g..g..t....
!"#$%&) ..t..c..c..c..a..c..a..c..c..g..t..c..c..a..g..g..c..c..g..g...
!"#&$*%&'(*%)'(**(*%+'+
"&"""""""""""""""""""""&,""""""""""""""""""""""),"""""""""""""""""""""""-,""""""""""""""""""""""".,""""""""""""""""""""""/,"""""""""""""""""""""""0,
0.0
50.0
100.0%
GC
0.0
50.0
100.0
% G
C
0.0
50.0
100.0
% G
C
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Annotation: dnaA dnaN recF gyrB lptA 0006 0007
0.0
50.0
100.0
% G
C
A
B
C
32 33 34 35 36 37 38 39 40
0
20
40
60
80
100
Annotation:
% G
C
PA0028
PA0029 PA0030 betC
PA0032
PA0033PA0034 trpA trpB
trpI
30 31
Genome position / Kbp
1,760
fabG ymfJ ymfM pgsA cinA recA pbpX ymdA ymdB spoVS
1,761 1,762 1,763 1,764 1,765 1,766 1,767 1,768 1,769 1,770
0
20
40
60
80
100
Annotation:
% G
CNC_002516 Pseudomonas aeruginosa PAO1 (%GC = 68.2)
NC_000964 Bacillus subtilis subtilis 168 (%GC = 43.1)
NC_000963 Rickettsia prowazekii Madrid E (%GC = 32.2)
214
sucA RP182 RP183 RP184 dnaK
215 216 217 218 219 220 221 222 223 224
0
20
40
60
80
100
Annotation:
% G
C
Frame analysis and sequence GC content
Relative representation of nucleotides at each codon position can be used as scores for nucleotide usage
Scores that depend on global nucleo-de composi-on are calculated for each trinucleo-de
-3
-2
-1
0
1
2
0.25 0.35 0.45 0.55 0.65GC
GWY
TGR
GCT
TGS
RWS
YSW
-1.0
-0.5
0
0.5
1.0
0.20 0.35 0.50 0.65 0.80
Codon base 1A1 C1 G1 T1
-1.0
-0.5
0
0.5
1.0
0.20 0.35 0.50 0.65 0.80
A2 C2 G2 T2
-1.0
-0.5
0
0.5
1.0
0.20 0.35 0.50 0.65 0.80
A3 C3 G3 T3
Codon base 2
Codon base 3Codons
GCGC
GC
Score
Score
Score(XYZ) = lnpX1
pY2pZ3
1− pstop(0)( )
pXpYpZ 1− pstop(1)( )
Cumulative scores and H-type hits
Cumula-ve scores can be used to translate qualita-ve visual informa-on on composi-onal contrasts (leX) into sta-s-cally characterized (Karlin and
Altschul 1990, Karlin 1994) measures of coding poten-al for precisely defined sequence segments (right).
Posi-on
Score
0742
Modified 0742
p = 10-‐3
Significant contrasts in characterized genes
!"#
!"$
!"%
!"&
!"'
!"(
!")
*"!
%!*!!*%!
(!!(%!)!!)%!
*!!!
'%!
'!!
&%!
+!!
#%!
#!!
&!!
%%!
%!!
$%!
+%!$!!
!"+
!"#"$%
!"+
!"+% !"$
!"$% !"%
!"%% !"&
!"&% !"'
!"'%
!"#% &!
!"*
!"! '()*+,"$-".-/0'%-1,+2-3-"(-&4+567-2,+%
25
30 35 40 45 50 55 60 65 70 75
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
5010015
020025
030035
040045
050055
060065
070075
080085
0900950
1000
% GC
Frac
tion
of c
hara
cter
ized
gen
es w
ith a
ny ty
pe h
its
p�����-3
Length/codons
p ≤ 0.01 p ≤ 0.001
Power of the tests is high for all long sequences and decreases more with shorter sequences of low or intermediate GC content
ORF structure and local composition: The reading-frame prediction-unit
The sequence is read frame-‐by-‐frame independently considering sequence segments included between consecu-ve stop codons (i.e., analyzed segments do not contain stop codons).
The local composi-on of the each sequence segments is independently calculated from each selected reading frame and segment-‐specific scores are built based on its composi-on.
Reading frame
Local composition
Only hits included within ORFs are considered (i.e., a poten-al start codon must be iden-fied).
ORF structure
NPACT home web page (hbp://genome.ufl.edu/npact/)
!"#$%&'()*+%&+,*)-.%/%01*+1)$-2'0%*1+,3456+7%/#()8)+2)0%/)9
! "!!! #!!! $!!! %!!! &!!! '!!! (!!! )!!! *!!! "!!!!!+!#!+!%!+!'!+!)!+!"!!+!
,--./0/1.-
2334
56789:0--./0/6;
:+;<=
;-0, ;-05 <6=> ?9<@
8A/, B,!!!'
B,!!!(
"!!!! ""!!! "#!!! "$!!! "%!!! "&!!! "'!!! "(!!! ")!!! "*!!! #!!!!!+!#!+!%!+!'!+!)!+!"!!+!
,--./0/1.-
2334
56789:0--./0/6;:+;<=
B,!!!(
?893 ?89C
/0? B,!!"" B,!!"# B,!!"$
B,!!"% B,!!"& /<D, B,!!"(
#!!!! #"!!! ##!!! #$!!! #%!!! #&!!! #'!!! #(!!! #)!!! #*!!! $!!!!!+!#!+!%!+!'!+!)!+!"!!+!
,--./0/1.-
2334
56789:0--./0/6;
:+;<=
B,!!"( EF/ ;6E
B,!!#! B,!!#" B,!!##
G.<
H6F> 0<.I A8=@ B,!!#( B,!!#)
$!!!! $"!!! $#!!! $$!!! $%!!! $&!!! $'!!! $(!!! $)!!! $*!!! %!!!!!+!#!+!%!+!'!+!)!+!"!!+!
,--./0/1.-
2334
56789:0--./0/6;
:+;<=
B,!!#)
B,!!#* B,!!$! J6/K
B,!!$#
B,!!$$B,!!$% /<A, /<A@
/<AL
>"?5@3
%!!!! %"!!! %#!!! %$!!! %%!!! %&!!! %'!!! %(!!! %)!!! %*!!! &!!!!!+!#!+!%!+!'!+!)!+!"!!+!
,--./0/1.-
2334
56789:0--./0/6;
:+;<=
/<ALB,!!$)B,!!$* B,!!%! B,!!%"
NPACT graphical output GC-profiles of Pseudomonas aeruginosa PAO1, complete genome.
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,0000.0
20.040.060.080.0
100.0
Input file CDS
Hits
Newly identified ORFs
% G
C
dnaA dnaN recF gyrBlptA PA0006
PA0007
H-2*G
10,000 11,000 12,000 13,000 14,000 15,000 16,000 17,000 18,000 19,000 20,0000.0
20.040.060.080.0
100.0
Input file CDS
Hits
Newly identified ORFs
% G
C
PA0007glyS glyQ
tag PA0011 PA0013trkA PA0017
H-17*G
PA0012
PA0014 PA0015
H-15-A
20,000 21,000 22,000 23,000 24,000 25,000 26,000 27,000 28,000 29,000 30,0000.0
20.040.060.080.0
100.0
Input file CDS
Hits
Newly identified ORFs
% G
C
fmt defPA0020 PA0021 PA0022
qorhemF aroE plcB PA0027
H-25*a
PA0017
PA0028
30,000 31,000 32,000 33,000 34,000 35,000 36,000 37,000 38,000 39,000 40,0000.0
20.040.060.080.0
100.0
Input file CDS
Hits
Newly identified ORFs
% G
C
PA0028PA0029 PA0030 betC
PA0032PA0034 trpA trpB
trpI
H-32*GH-35*G
H-39*A
PA0033
40,000 41,000 42,000 43,000 44,000 45,000 46,000 47,000 48,000 49,000 50,0000.0
20.040.060.080.0
100.0
Input file CDS
Hits
Newly identified ORFs
% G
C
Sequence position / nt
trpI PA0040 PA0041PA0038
PA0039
Newly-‐iden-fied ORFs in P. aeruginosa strains
Strain! ID! Genome lengh! GC%! Annotated
CDS!Newly-
identified ORFs!
PAO1! NC_002516! 6264404! 66.56! 5572! 179!UCBPP-PA14! NC_008463! 6537648! 66.29! 5892! 173!PA7! NC_009656! 6588339! 66.45! 6286! 189!LESB58! NC_011770! 6601757! 66.3! 5925! 258!M18! NC_017548! 6327754! 66.5! 5684! 161!NCGM2_S1! NC_017549! 6764661! 66.14! 6268! 250!DK2! NC_018080! 6402658! 66.27! 5883! 157!B136_33! NC_020912! 6421010! 66.42! 5828! 160!RP73! NC_021577! 6342034! 66.46! 5762! 187!
From 157 to 258 ORFs with significant (p < 0.001) composi-onal periodicity and not corresponding to annotated genes are iden-fied in different strains of
P. aeruginosa.
Conserva-on of newly-‐iden-fied ORFs between Pseudomonas aeruginosa strains
Strain! Cat! B136_33! DK2! LESB58! M18! NCGM2_S1! PA7! PAO1! RP73! PA14! Cons! Non Cons.!
B136_33! ORF!CDS!
!27!
66!43!
47!47!
49!44!
78!45!
22!46!
48!44!
59!43!
66!48! 130! 30!
DK2! ORF!CDS!
66!48!
!30!
60!54!
63!51!
57!49!
28!56!
61!52!
69!49!
55!59! 135! 22!
LESB58! ORF!CDS!
47!128!
60!130!
!51!
66!131!
65!121!
22!130!
108!79!
62!127!
59!120! 212! 46!
M18! ORF!CDS!
49!53!
63!49!
66!56!
!26!
56!48!
27!49!
62!44!
58!53!
62!54! 134! 27!
NCGM2_S1!
ORF!CDS!
78!79!
57!72!
65!57!
56!64!
!31!
30!79!
58!51!
55!71!
66!65! 175! 75!
PA7! ORF!CDS!
22!47!
28!49!
22!39!
27!45!
30!45!
!34!
27!45!
28!51!
32!49! 98! 91!
PAO1! ORF!CDS!
48!120!
61!121!
108!70!
62!115!
58!107!
27!113!
!32!
58!125!
68!104! 187! 11!
RP73! ORF!CDS!
59!91!
69!98!
62!96!
58!99!
55!93!
28!93!
58!101!
!49!
53!104! 171! 17!
PA14! ORF!CDS!
66!66!
55!64!
59!44!
40!55!
66!54!
32!60!
68!38!
53!65!
!32! 146! 27!
Homologs for the majority of newly iden-fied ORF are iden-fied among annotated genes or newly-‐iden-fied ORFs of other P. aeruginosa
strains.
Name! Phyla! Genera! Species! Strains! Tot_con! Not_con! Tot! Tot Strain!
PAO1! 59! 26! 21! 13! 119! 60! 179! 109!UCBPP-PA14! 59! 10! 17! 11! 97! 76! 173! 89!PA7! 67! 11! 15! 17! 110! 79! 189! 95!LESB58! 128! 19! 25! 10! 182! 76! 258! 174!M18! 41! 17! 23! 18! 99! 62! 161! 88!NCGM2_S1! 105! 15! 27! 12! 159! 91! 250! 146!DK2! 42! 16! 18! 11! 87! 70! 157! 75!B136_33! 34! 11! 13! 13! 71! 89! 160! 62!RP73! 84! 17! 11! 7! 119! 68! 187! 110!Total! 619! 142! 170! 112! 1043! 671! 1714! 948!
Long-‐range onserva-on of ORFs newly-‐iden-fied in Pseudomonas aeruginosa strains
Most newly iden-fied ORF conserved among P. aeruginosa strains are conserved over different phyla
Verifica-on of computa-onal gene predic-on by transcriptome analysis in Pseudomonas
aeruginosa PAO1: RNA-‐seq and Ribosome Footprin-ng
!"#$%## !"#%%##
#
"
$
"
$
&''()*)+,-
#.#
%#.#
!##.#
/0123
4(567(8')
9:);
<+=-!!!" !!!>
!"#$%&'(
!"#$%&& !"#%%&&
&
'
$
'
$
())*+,+-./
&0&
%&0&
!&&0&
12345
6*789*:)+
;<+=
>-?/@,=(
,AB(
!"#$%&'(
!"!#$"" !"!%$""
"
!
&
!
&
'(()*+*,-.
"/"
$"/"
0""/"
12345
6)789):(*
;<*=
>,?.0%@@ ABC' ADC'
!"#$%%"&
!""#### !""$###
#
!
%
!
%
&''()*)+,-
#.#
/#.#
$##.#
01234
5(678(9')
:;)<
=+>-
*,*!$$?
!"#$%&'(
!"!#$$$ !"!%$$$
$
!
&
!
&
'(()*+*,-.
$/$
#$/$
0$$/$
12345
6)789):(*
;<*=
>,?.
!&@$ !&@0
!"#$%$&'
!""#$$$ !""%$$$
$
&
'
&
'
())*+,+-./
$0$
1$0$
2$$0$
34567
8*9:;*<)+
=>+?
@-A/
!1#' !1#1
!"#$%%"&
!""#$"" !""%$""
"
&
!
&
!
'(()*+*,-.
"/"
$"/"
0""/"
12345
6)789):(*
;<*=
>,?.
(+@A B$%$
!"#$%$"&
!"!#""" !"!$"""
"
$
%
$
%
&''()*)+,-
"."
/"."
#""."
01234
5(678(9')
:;)<
=+>-
$?@! $?@%
!"#$%&'(
!"#$### !"#%###
#
&
'
&
'
())*+,+-./
#0#
$#0#
1##0#
23456
7*89:*;)+
<=+>
?-@/
!!#$ !!#%
!"#$$%"& !"#$$'(&
!"!"### !"!$###
#
%
!
%
!
&''()*)+,-
#.#
"#.#
/##.#
01234
5(678(9')
:;)<
=+>-(?@4
!#$A
!"#$%$&'
!"#"### !"#$### !"%####
#
&
'
&
'
())*+,+-./
#0#
1#0#
%##0#
23456
7*89:*;)+
<=+>
?-@/
!'#& !'#! !'#'
!"#$%%&'
!!"#$%% !!"!$%%
%
!
&
!
&
'(()*+*,-.
%/%
$%/%
#%%/%
01234
5)678)9(*
:;*<
=,>.
!%?$
!%?@!"#$#%&'
The P. aeruginosa transcriptome: RNA-‐seq
0
500
1000
1500
2000
2500
3000
3500
Long
Characterized
Long
Hypothetical
Short
Characterized
Short
Hypothetical
Tota
l nu
mb
er
Classes of annotated genes
0
10
20
30
40
50
60
Long
Characterized
Long
Hypothetical
Long Non-
conserved
Short
Characterized
Short
Hypothetical
Short Non-
conserved
TRWDO�QXP
EHU�RI�QHZ
O\�LGHQWLÀHG�25)V
&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Long
Characterized
Long
Hypothetical
Short
Characterized
Short
Hypothetical
Fra
ction e
xpre
ssed
Classes of annotated genes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Long
Characterized
Long
Hypothetical
Long Non-
conserved
Short
Characterized
Short
Hypothetical
Short Non-
conserved
Fra
ctio
n e
xp
resse
d
&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V
Expression of predicted genes by length and conserva-on classes Published annota-on Newly iden-fied ORFs
ORFs with RNA-‐seq reads
What do we learn about gene predic-ons from transcrip-on in bacteria?
Unexpected paberns
33000 34000 35000 36000 37000 38000
0
2
4
2
4
Annotation
0.0
50.0
100.0
% C
+G
Log-count
Hits
New
betC
0032
0033 0034 trpA trpB
H-51*A
Contradictory paberns of expression of well defined protein coding genes
In the case of predic-on of H-‐443*A , sequence features appear to be more convincing than RNA expression data
What do we learn about gene predic-ons from transcrip-on in bacteria?
The problem of an-sense transcrip-on
347000 348000 349000
0
2
4
2
4
Annotation
0.0
50.0
100.0
% C
+G
Lo
g-co
un
tHits
New
0306
0307
H-443*A
Ribosome footprin-ng (Ingolia et al, Science 2009)
Ribosome stalling with translation-elongation inhibitor tetracycline
Cell lysis and digestion of unprotected RNA
3XULÀFDWLRQ�RI�ULERVRPH�footprints
cDNA library preparation for deep-sequencing and genome mapping
Schema-c representa-on of the ribosome footprin-ng procedure applied to P. aeruginosa
Ribosome footprint coverage in P. aeruginosa
3,110 3,111 3,112 3,113 3,114 3,115 3,116 3,117 3,118 3,119 3,120
0
2
4
2
4
Published
0.0
50.0
100.0
% C
+G
Log-
coun
t
New
2748 endA 2750 2751 2752 2753 2754 eco 2756 2757 2758 2759
H-3432-G H-3436*A H-3435*A H-3440*g H-3444*A
Genome position / Kbp
# of re
ads
Example of ribosome footprint coverage in P. aeruginosa PAO1 showing rela-on with S-‐profiles, annotated genes and
newly iden-fied ORFs.
Ribosome footprints of ini-a-on sites
The an-bio-c tetracycline inhibits transla-on-‐elonga-on stalling ac-vely-‐transla-ng ribosomes
Ribosome footprints of ini-a-on sites
However, tetracycline does not prevent more ribosomes to be recruited at the ini-a-on site.
Ribosome footprints of ini-a-on sites
The accumula-on of ribosomes will result in increased numbers of profile-‐reads corresponding to the ini-a-on site.
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
-30
-15 0 15 30 45 60 75 90 105
120
135
150
165
180
195
210
225
240
255
270
285
300
Rel
ativ
e co
vera
ge
Position from start of translation / nt
RNA-seq
Ribosome footprints
Ribosome footprint coverage by codon posi-on
Metagene analysis of ribosome-‐footprint coverage Coverage is averaged over all genes, rela-ve to the start of transla-on
Ribosome footprint coverage by codon posi-on: center of reads
Metagene analysis of coverage by read center + 2 nt Coverage is averaged over all genes, rela-ve to the start of transla-on
0
10
20
30
40
50
60
70
80
-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100
Rib
osom
e-fo
otpr
int c
over
age
Position relative to start of translation / nt
Transla-onal evidence by ribosome footprin-ng in P. aeruginosa
0
50
100
150
200
250
300
350
400
0 200 400 600 800 1000 1200 1400 1600
Cov
erag
e
Position from start of translation / nt
groEL (547 aa) c4917124..4915481 Characterized
Ribosome-‐footprint read-‐count paberns iden-fy mRNA transla-on, transla-on ini-a-on site, and transla-on pausing.
Transla-onal evidence by ribosome footprin-ng in P. aeruginosa
0
50
100
150
200
250
0 200 400 600 800 1000 1200 1400 1600
Cov
erag
e R
FP2
Position from start of translation / nt
RFP1
0
50
100
150
200
250
Cov
erag
e R
FP1
groEL (547 codons)
RFP2
Similar coverage paberns are observed in different biological replicates
Scoring RFP expression
“Strength” of evidence decreases for poorly translated mRNA.
0
1
2
3
4
5
6
0 100 200 300 400 500 600 700 800
Cov
erag
e
Position relative to predicted start of translation
2758 (295 aa) 3118296..3119183 Characterized
RFP control
Scoring RFP expression
“Strength” of the evidence of expression is measured by an “Expression Index”.
=C0 lnC0C1
C0
C1
: Count of RFP reads in codon posi-ons [-‐2,+3] / 5;
: Count of RFP reads in codon posi-ons [+9, len/2] / (len/2 -‐9);
Expression Index
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100 1000 10000
Cov
erag
e St
art /
Cod
ing
regi
on
Coverage coding region
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100 1000 10000
Cov
erge
Sta
rt / C
odin
g re
gion
Coverage coding region
Long published genes Short published genes
Long newly iden-fied ORFs Short newly iden-fied ORFs
Expression of published and newly-‐predicted genes in Pseudomonas aeruginosa
* Coverage normalized by the number of posi-ons
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100 1000 10000
Cov
erag
e St
art /
Cod
ing
regi
on
Coverage coding region
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100 1000 10000
Cov
erag
e St
art /
Cod
ing
regi
on
Coverage coding region
0
500
1000
1500
2000
2500
3000
3500
Long
Characterized
Long
Hypothetical
Short
Characterized
Short
Hypothetical
Tota
l num
be
r
Classes of annotated genes
0
10
20
30
40
50
60
Long
Characterized
Long
Hypothetical
Long Non-
conserved
Short
Characterized
Short
Hypothetical
Short Non-
conserved
TRWDO�QXP
EHU�RI�QHZ
O\�LGHQWLÀHG�25)V
&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V
Expression of predicted genes by length and conserva-on classes Published annota-on Newly iden-fied ORFs
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Long
Characterized
Long
Hypothetical
Short
Characterized
Short
Hypothetical
Fra
ction e
xpre
ssed
Classes of annotated genes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Long
Characterized
Long
Hypothetical
Long Non-
conserved
Short
Characterized
Short
Hypothetical
Short Non-
conserved
Fra
ctio
n e
xp
re
sse
d
&ODVVHV�RI�QHZO\�LGHQWLÀHG�25)V
ORFs with Expression Index ≥ 12.0
Iden-fica-on of transla-on-‐ini-a-on sites by ribosome-‐footprin-ng
0
20
40 60
80
100
120
140 160
180
0 100 200 300 400 500 600 700
Cov
erag
e
Position relative to predicted start of translation
fliA (247 aa) 1584795..1585538 Characterized
RFP control
0 50
100 150 200
250 300
350 400
450
-300 -200 -100 0 100 200 300
Cov
erag
e
Position relative to predicted start of translation
cheY (124 aa) 1585640..1586014 Characterized
RFP control
FliA, sigma factor of RNA polymerase for flagellin gene transcrip-on. CheY is involved in transmission of sensory signal to the flagellar motor.
Start of transla-on iden-fica-on by RFP read accumula-on
Annotated Newly iden.fied
Same start 0.850 0.778
Different start 0.150 0.222
Ribosome footprints confirm the predicted start of transla-on for 85% of the genes, and of 78% of the newly-‐iden-fied ORFS, among those with evidence of transla-on.
0
50
100
150
200
250
300
350
400
-100 0 100 200 300 400
Cov
erag
e
Position relative to predicted start of translation
eco (156 aa) 3116654..3117124 Characterized
RFP control
Iden-fica-on of new genes by ribosome-‐footprint evidence
A new gene is found to be expressed 5’ of the gene eco for Eco-n, a protease inhibitor localized to the periplasmic space.
3,114,000 3,115,000 3,116,000 3,117,000 3,118,0000.0
20.040.060.080.0
100.0
Input file CDS:
Hits
Newly identified ORF:
% G
CPA2752
PA2753 ecoPA2756 PA2757
PA2754
H-3445*A
Sequence position / nt
Iden-fica-on of new genes by ribosome-‐footprint evidence
The newly-‐iden-fied translated ORF (red circle) corresponds to a region of weak composi-onal 3-‐base periodicity.
Ribosome footprint coverage in P. aeruginosa
Examples of RFP-‐based gene discovery in P. aeruginosa PAO1 showing rela-on with S-‐profiles and annotated genes.
2,058 2,059 2,060 2,061
0
2
4
2
4
Annotated:
0.0
50.0
100.0
% G
C
Log-
coun
t
PAO1_1888
Sequence position / Kbp
PAO1_1889 PAO1_18890
0
100 200 300 400 500 600
-150 -100 -50 0 50 100 150 200 250
Cove
rage
Position relative to predicted start of translation
MHGP10 RFP1 control
MHGP10Identified by RFP:
Lab members
• Steve Oden – Postdoctoral associate. Development of gene finding methods and soXware, gene content analysis in human and prokaryotes.
• Eric Hernandez – Programmer.
• Dr. Anna Picca– Postdoctoral associate. RNA-‐seq and ribosome profiling
• Dr. Ying Zhang – Postdoctoral associate. RNA-‐seq
• Dr. Shouguang Jin (Molecular Gene-cs and Microbiology). P. aeruginosa
• Dr. Silvia Tornale| (Medicine). Transcrip-on and RNA.
Collaborators
• Dr. Rolf Renne and MGM.
• Dr. Jianhong Hu (Research Scien-st)
Sequencing facility and support
Thanks to
• NIH R01 GM087485-‐01A2
• MGM, Gene-cs Ins-tute, College of Medicine.
Funding