1
Are essential genes conserved? Fatemeh Ashari Ghomi 1 , Paul Gardner 1 , Lars Barquist 2 1 School of Biological Sciences, University of Canterbury, Christchurch, New Zealand 2 Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany Transposon-directed insertion-site sequencing is an approach for studying the essentiality of genes in prokaryotes. In this method, pools of single insertion mutants are constructed using transposon mutagene- sis and the effect of each mutation on the mutants’ sur- vival is evaluated by sequencing the survivors. This can lead to the identification of essential genes. We have used transposon-directed insertion-site se- quencing to study the essentiality of genes in 12 strains from Enterobacteriaceae which are depicted in the figure. For this, we have studied different biases that can affect our transposon insertion experiment. After correcting for the biases, we have studied the relation between the essentiality of genes and their conservation. Klebsiella pneumoniae Ecl8 Salmonella Typhimurium SL1344 Salmonella Enteritidis P125109 Escherichia coli UPEC ST131 Salmonella Typhimurium D23580 Escherichia coli ETEC CS17 Salmonella Typhimurium A130 Enterobacter cloacae NCTC 9394 Citrobacter rodentium ICC168 Escherichia coli ETEC H10407 Klebsiella pneumoniae RH201207 Salmonella Typhi Ty2 Introduction Questions 1. Are there any biases that affect the results of transposon insertion experiments? 2. Is the conservation of essentiality consistent with the species tree? 3. Are essentiality of genes and their conservation related? Questions 1. Are there any biases that affect the results of transposon insertion experiments? 2. Is the conservation of essentiality consistent with the species tree? 3. Are essentiality of genes and their conservation related? Transposon insertion is the process of inserting a nucleotide sequence into a gene so that it disrupts the gene and causes the gene lose its functionality. If the gene is essential the organism will not be able to survive. If it is non-essential the organism will be able to survive. If it is a beneficial loss the organism will benefit from los- ing it. After genome sequencing: No or few transposon insertions are spotted in essential genes. An intermediate number of transposon insertions are detected in non-essential genes. Many transposon insertions are observed in beneficial losses. Transposon-directed insertion-site sequencing We have divided our genes into 3 segments: 5% of the genes on the 5’ end, 20% of the genes on the 3’ end, and the rest in the middle. The figure shows that the number of inser- tions on the 3’ and 5’ ends is more than the internal region in essential genes and less than the internal region in beneficial losses. Essential position mean ii 0.0 0.1 0.2 0.3 0.4 5' 3' First 5% internal Last 20% Beneficial loss position mean ii 0.0 0.5 1.0 1.5 2.0 2.5 5' 3' First 5% internal Last 20% Are transposon insertions evenly distributed within genes? 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Distance bias Distance from the origin insertion index We have also investigated if the number of in- sertions in genes is related to the position of the gene within the genome or the G-C con- tent of the gene. The results propose the further we get from the origin of replication, the fewer number of insertions we have (the left figure). Moreover, the right figure shows that when the G-C content is greater than 0.5, there is no bias. 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 GC bias GC content insertion index To test whether the results are biased towards certain motifs, we have generated logos from 10 nucleotides flanking the 100 top most frequent insertion sites. The analysis shows no significant bias. probability C T A G C A T G A G C T C A G T G T A C C A T G T A G C A T C G G A T C C T G A A T C G A C G T A G C T A G T C G C T A T C A G A T G C C T G A A T G C G C A T A T C G 0 1 5 10 15 20 1 bits C A T G A G C G A T C G A A T C G A C G T A G T C C T A T C A G T G A A T G C T A T C G 2 0 The top 100 most frequent insertion sites 5 10 15 20 Are transposons biased towards certain positions in the genome? We have compared the number of genes that are conserved in different strains in our study and the number of genes that are essential in these strains. The results propose that although conservation of genes follows a tree-like trend, the essentiality does not show a tree-like signal. 1909 779 471 261 208 165 121 97 93 93 84 82 78 77 74 64 61 61 58 56 0 500 1000 1500 2000 Intersection Size Conservation Escherichia coli K-12 MG1655 Salmonella Typhi Ty2 Escherichia coli ETEC H10407 Salmonella Typhimurium A130 Escherichia coli ETEC CS17 Salmonella Typhimurium D23580 Escherichia coli UPEC ST131 Klebsiella pneumoniae RH201207 Klebsiella pneumoniae Ecl8 Salmonella Enteritidis P125109 Salmonella Typhimurium SL1344 Enterobacter cloacae NCTC 9394 Citrobacter rodentium ICC168 180 124 56 48 43 36 32 31 25 24 22 21 19 18 9 7 7 6 5 4 0 50 100 150 Intersection Size Essentiality Escherichia coli K-12 MG1655 Salmonella Typhi Ty2 Escherichia coli ETEC H10407 Salmonella Typhimurium A130 Escherichia coli ETEC CS17 Salmonella Typhimurium D23580 Escherichia coli UPEC ST131 Klebsiella pneumoniae RH201207 Klebsiella pneumoniae Ecl8 Salmonella Enteritidis P125109 Salmonella Typhimurium SL1344 Enterobacter cloacae NCTC 9394 Citrobacter rodentium ICC168 Is the conservation of essentiality consistent with the species tree? We have divided the genes in our 12 strains into 3 groups: genus spe- cific genes, genes with one copy per genome, and genes with multiple copies per genome. The study of essential- ity in these groups shows that most of the essential genes are copied once per genome and most of the beneficial losses are genus specific. We have performed a pathway en- richment analysis on different groups of genes in Salmonella Typhi using KOBAS 2.0. The results indicate that essential genes are mostly involved in essential pathways such as replication and translation; the enrichment of the pathways related to non-essential genes is not statistically significant; and the beneficial losses are mostly involved in pathways that are not needed in nutrient- rich broth. All clusters Insertion Index Frequency 0 1 2 3 4 0 200 n = 6550 Essential Non-essential Beneficial loss Genus specific 0 1 2 3 4 0 100 n = 2884 Single copy 0 1 2 3 4 n = 2742 Multiple copy 0 1 2 3 4 n = 924 Protein export DNA replication Homologous recombination Terpenoid backbone biosynthesis Ribosome 0 2 4 6 8 -log10(P-value) Pathway Essential Flagellar assembly Microbial metabolism in diverse environments Phosphotransferase system (PTS) Sulfur metabolism Two-component system 0.0 0.5 1.0 -log10(P-value) Pathway Non-essential Phosphotransferase system (PTS) Lipopolysaccharide biosynthesis Bacterial invasion of epithelial cells Salmonella infection Bacterial secretion system 0 2 4 6 -log10(P-value) Pathway Beneficial losses Are essential genes more likely to be conserved? The 5’ and 3’ ends of genes have a different tolerance for insertions compared to the internal region in transposon-directed insertion- site sequencing. The number of transposons inserted to a gene is related to the distance of the gene from the origin of replication. The transposons are not biased towards certain motifs or G-C content of the gene. The conservation of essentiality is not consistent with the species tree. Essential genes are more likely to be conserved. Conclusions Contact Fatemeh Ashari Ghomi [email protected] Contact Fatemeh Ashari Ghomi [email protected]

Are essential genes conserved? - Amazon S3 essential genes conserved? Fatemeh Ashari Ghomi1, Paul Gardner1, Lars Barquist2 1 School of Biological Sciences, University of Canterbury,

Embed Size (px)

Citation preview

Page 1: Are essential genes conserved? - Amazon S3 essential genes conserved? Fatemeh Ashari Ghomi1, Paul Gardner1, Lars Barquist2 1 School of Biological Sciences, University of Canterbury,

Are essential genes conserved?Fatemeh Ashari Ghomi1, Paul Gardner1, Lars Barquist2

1 School of Biological Sciences, University of Canterbury, Christchurch, New Zealand2 Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany

Transposon-directed insertion-site sequencing is anapproach for studying the essentiality of genes inprokaryotes. In this method, pools of single insertionmutants are constructed using transposon mutagene-sis and the effect of each mutation on the mutants’ sur-vival is evaluated by sequencing the survivors. Thiscan lead to the identification of essential genes.We have used transposon-directed insertion-site se-quencing to study the essentiality of genes in 12strains from Enterobacteriaceae which are depicted inthe figure. For this, we have studied different biasesthat can affect our transposon insertion experiment.After correcting for the biases, we have studied therelation between the essentiality of genes and theirconservation.

Klebsiella pneumoniae Ecl8

Salmonella Typhimurium SL1344

Salmonella Enteritidis P125109

Escherichia coli UPEC ST131

Salmonella Typhimurium D23580

Escherichia coli ETEC CS17

Salmonella Typhimurium A130

Enterobacter cloacae NCTC 9394

Citrobacter rodentium ICC168

Escherichia coli ETEC H10407

Klebsiella pneumoniae RH201207

Salmonella Typhi Ty2

Introduction

Questions1. Are there any biases that affect the results of transposon insertion experiments?

2. Is the conservation of essentiality consistent with the species tree?

3. Are essentiality of genes and their conservation related?

Questions1. Are there any biases that affect the results of transposon insertion experiments?

2. Is the conservation of essentiality consistent with the species tree?

3. Are essentiality of genes and their conservation related?

Transposon insertion is the process of inserting a nucleotide sequence into a geneso that it disrupts the gene and causes the gene lose its functionality.

• If the gene is essential the organism will not be able tosurvive.

• If it is non-essential the organism will be able to survive.

• If it is a beneficial loss the organism will benefit from los-ing it.

After genome sequencing:

• No or few transposon insertions are spotted in essential genes.

• An intermediate number of transposon insertions are detected in non-essentialgenes.

• Many transposon insertions are observed in beneficial losses.

Transposon-directed insertion-site sequencing

We have divided our genesinto 3 segments: 5% ofthe genes on the 5’ end,20% of the genes on the3’ end, and the rest in themiddle. The figure showsthat the number of inser-tions on the 3’ and 5’ endsis more than the internalregion in essential genesand less than the internalregion in beneficial losses.

Essential

position

mea

n ii

0.0

0.1

0.2

0.3

0.4

5' 3'

First 5%internalLast 20%

Beneficial loss

position

mea

n ii

0.0

0.5

1.0

1.5

2.0

2.5

5' 3'

First 5%internalLast 20%

Are transposon insertions evenly distributed within genes?

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Distance bias

Distance from the origin

inse

rtio

n in

dex

We have also investigated if the number of in-sertions in genes is related to the position ofthe gene within the genome or the G-C con-tent of the gene. The results propose the furtherwe get from the origin of replication, the fewernumber of insertions we have (the left figure).Moreover, the right figure shows that when theG-C content is greater than 0.5, there is no bias.

0.2 0.3 0.4 0.5 0.6 0.7

0.0

0.5

1.0

1.5

2.0

2.5

3.0

GC bias

GC content

inse

rtio

n in

dex

To test whether the results are biased towards certain motifs, we have generated logos from 10 nucleotides flanking the 100 top mostfrequent insertion sites. The analysis shows no significant bias.

probability

CTAG

CATG

AGCT

CAGT

GTAC

CATGTAGC

ATCG

GATCCTGAATCG

ACGTAGCT

AGTCGCTATCAG

ATGC

CTGAATGCGCATATCG

0

1

5 10 15 20

1

bit

s

C

T

T

A

C

CATGT

AGC

GATCC

T

GA

ATCGACGT

A

GTCG

CTAT

CAG

C

TGA

ATGCG

C

A

T

ATCG

2

0

The top 100 most frequent insertion sites

5 10 15 20

Are transposons biased towards certain positions in the genome?

We have compared the number of genes that are conserved in different strains in our study and the number of genes that are essentialin these strains. The results propose that although conservation of genes follows a tree-like trend, the essentiality does not show atree-like signal.

1909

779

471

261

208

165

121

97 93 93 84 82 78 77 74 64 61 61 58 56

0

500

1000

1500

2000

Inte

rsec

tion

Size

Conservation

Escherichia coli K-12 MG1655

Salmonella Typhi Ty2

Escherichia coli ETEC H10407

Salmonella Typhimurium A130

Escherichia coli ETEC CS17

Salmonella Typhimurium D23580

Escherichia coli UPEC ST131

Klebsiella pneumoniae RH201207

Klebsiella pneumoniae Ecl8

Salmonella Enteritidis P125109

Salmonella Typhimurium SL1344

Enterobacter cloacae NCTC 9394

Citrobacter rodentium ICC168

180

124

5648

4336 32 31

25 24 22 21 19 189 7 7 6 5 4

0

50

100

150

Inte

rsec

tion

Size

Essentiality

Escherichia coli K-12 MG1655

Salmonella Typhi Ty2

Escherichia coli ETEC H10407

Salmonella Typhimurium A130

Escherichia coli ETEC CS17

Salmonella Typhimurium D23580

Escherichia coli UPEC ST131

Klebsiella pneumoniae RH201207

Klebsiella pneumoniae Ecl8

Salmonella Enteritidis P125109

Salmonella Typhimurium SL1344

Enterobacter cloacae NCTC 9394

Citrobacter rodentium ICC168

Is the conservation of essentiality consistent with the species tree?

We have divided the genes in our12 strains into 3 groups: genus spe-cific genes, genes with one copy pergenome, and genes with multiple copiesper genome. The study of essential-ity in these groups shows that mostof the essential genes are copied onceper genome and most of the beneficiallosses are genus specific.We have performed a pathway en-richment analysis on different groupsof genes in Salmonella Typhi usingKOBAS 2.0. The results indicate thatessential genes are mostly involved inessential pathways such as replicationand translation; the enrichment of thepathways related to non-essential genesis not statistically significant; and thebeneficial losses are mostly involved inpathways that are not needed in nutrient-rich broth.

All clusters

Insertion Index

Fre

quen

cy

0 1 2 3 4

020

0

n = 6550

EssentialNon−essentialBeneficial loss

Genus specific

0 1 2 3 4

010

0

n = 2884

Single copy

0 1 2 3 4

n = 2742

Multiple copy

0 1 2 3 4

n = 924

Protein export

DNA replication

Homologous recombination

Terpenoid backbone biosynthesis

Ribosome

0 2 4 6 8−log10(P−value)

Pat

hway

Essential

Flagellar assembly

Microbial metabolism in diverse environments

Phosphotransferase system (PTS)

Sulfur metabolism

Two−component system

0.0 0.5 1.0−log10(P−value)

Pat

hway

Non−essential

Phosphotransferase system (PTS)

Lipopolysaccharide biosynthesis

Bacterial invasion of epithelial cells

Salmonella infection

Bacterial secretion system

0 2 4 6−log10(P−value)

Pat

hway

Beneficial losses

Are essential genes more likely to be conserved?

• The 5’ and 3’ ends of genes have a different tolerance for insertions compared to the internal region in transposon-directed insertion-site sequencing.

• The number of transposons inserted to a gene is related to the distance of the gene from the origin of replication.

• The transposons are not biased towards certain motifs or G-C content of the gene.

• The conservation of essentiality is not consistent with the species tree.

• Essential genes are more likely to be conserved.

Conclusions

ContactFatemeh Ashari [email protected]

ContactFatemeh Ashari [email protected]