23
Wisdom of the Land Mahidol University มหาทยายมดล ปัญญาของแผ่นดิน Genotyping of Burkholderia pseudomallei Pravech Ajawatanawong Kamolchanok Rukseree Phajongjit Karraphan Lab Training and Risk Management Workshop 9 July 2015

Genotyping of Burkholderia pseudomallei - MBDS …€¦ · Genotyping of Burkholderia pseudomallei Pravech Ajawatanawong ... Article 429 | 6 Sawana A et al. (2014) Frontiers in Genet

  • Upload
    ngodung

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Wisdom of the LandMahidol Universityมหาวิทยาลัยมหิดล ปัญญาของแผ่นดิน

Genotyping of Burkholderia pseudomallei

Pravech Ajawatanawong Kamolchanok Rukseree Phajongjit Karraphan

Lab Training and Risk Management Workshop 9 July 2015

Genotyping

differentiates microbes (types)

trend moves from taxonomy to systematic (concern more on evolution)

from morphologies to molecular

from electrophoresis based to sequence based

ideal typing is whole genome sequencing based

Traditional Sequence Typing

appears to be very popular in several microbiology articles

single gene or partial sequence of DNA

phylogenetic based method

much better idea compares to taxonomic approach

evolutionary relationship

A gene tree of Burkholderia spp. rpoB gene

Kumar S et al. (2013) PLOS ONE 10.1371/journal.pone.0070624.g001.

16S rDNA Tree

Sawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 2 | A maximum likelihood tree based on the 16S rRNA genesequences of 97 members of the genus Burkholderia. Accessionnumbers for the 16S rRNA sequenced used for each organism areprovided in the brackets following the name of the organism. The tree

was rooted using four species from the genera Cupriadivus andRalstonia. Bootstrap analysis scores are indicated for each node. Themajor Burkholderia clades (Clades I and II) and the subclades withinClade I are indicated by brackets.

Frontiers in Genetics | Evolutionary and Genomic Microbiology December 2014 | Volume 5 | Article 429 | 6

Sawana A et al. (2014) Frontiers in Genet 5:1-22.

Ideal of Genotyping–Utopia

uses whole genome sequence data

limitations of this approach are cost

calculation method (algorithm)

computational power

sample preparation

Sawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 1 | A maximum likelihood phylogenetic tree of thegenome sequenced members of the genus Burkholderia basedupon concatenated sequences of 21 conserved proteins. The treewas rooted using Cupriavidus necator N-1, Bordetella pertussis

Tohama I, and Neisseria meningitides MC58. Bootstrap analysisscores are indicated for each node. The major Burkholderia clades(Clades I and II) and their main sub-clades are indicated bybrackets.

relevant group within the Burkholderia. All species within thisclade are potentially pathogenic to human, animals, or plants andmost have been isolated from clinical human samples (Simpsonet al., 1994; Mahenthiralingam et al., 2002, 2005; Biddick et al.,2003; O’Carroll et al., 2003). One example of a CSI that is specificto the Clade I Burkholderia is shown in Figure 3A. In this case, aone amino acid deletion is present in a highly conserved region

of a periplasmic amino acid-binding protein. The indel is flankedon both sides by highly conserved regions indicating that itis not the result of alignment artifacts and that it is a reliablegenetic characteristic. This CSI is present in all of the sequencedmembers of the Clade I Burkholderia, but absent in all otherbacterial homologs of this protein. Our work has identified 5additional CSIs in other widely distributed proteins that are

www.frontiersin.org December 2014 | Volume 5 | Article 429 | 5

Phylogenomic of Burkholderia spp.

Sawana A et al. (2014) Frontiers in Genet 5:1-22.

Multi Locus Sequence Typing (MLST)

widely uses among bacteria and other organisms

multiple targets

DNA amplification

DNA sequencing

compare to the database

software-dependent analysis

Multi Locus Sequence Typing (MLST)

highly reduce ambiguity

portable techniques to characterize bacterial isolates

targets at multiple house keeping genes (certainty)

principle is simple (requires only basic molecular techniques)

methodology is easy (routinely in general molecular lab)

MLST Database

PubMLST

Multi Locus Sequence Typing (MLST)

ATCGTAGCTGATCGATCGACTAGCTGTACGTGACTGACATCGTAGCTGATCGATCGACTACCTGTACGTGACAGACATCGTTGCTGATCGTTCGACTAGCTGTACGTGACAGACATCGTTGCTGATCGATCGACTAGCTGTAGGTGACAGACATCGTAGCTGATCGATCGACTAGCTGTAGGTGACAGAC

allel 1 A A G C Tallel 2 A A C C Aallel 3 T T G C Aallel 4 T A G G Aallel 5 A A G G A

The five isolates of B. mallei had identical allelic profiles(ST40) and clustered with the B. pseudomallei isolates; for sixof the seven MLST loci, the alleles in B. mallei were also foundwithin B. pseudomallei isolates. The allele at the other locus(narK-18) was not found in any of the B. pseudomallei isolates,but it differed at only a single nucleotide site from one of themost common of the alleles in the latter species (narK-1).Inspection of the incomplete genome of B. mallei ATCC

23344, which was recovered in 1942 from a horse in China andwhich is being sequenced by The Institute for Genome Re-search (http://www.tigr.org/), showed that this strain also hadan allelic profile identical to those of the other five B. malleistrains.

A total of 128 isolates were assigned to the species B.pseudomallei by MLST, and these were resolved into 71 STs.Among the B. pseudomallei isolates, there were 37 isolates

FIG. 1. Variable sites within the alleles at the seven MLST loci. The sequences of all of the alleles at each locus that are represented amongthe 147 Burkholderia isolates are shown. Only the variable sites are shown, and these are numbered in vertical format. For allele 1 at each locus,the nucleotide present at each variable site is shown. For other alleles, only those sites where the nucleotides differ from those in allele 1 are shown;sites that have the same nucleotide as that in allele 1 are shown by a dot. The alleles in normal font are from B. pseudomallei, those in boldfaceare from B. thailandensis, and the final allele at each locus (in italics) is from the Oklahoma isolates.

VOL. 41, 2003 MLST SCHEME FOR B. PSEUDOMALLEI 2071

on July 8, 2015 by MAHIDO

L UNIV FAC OF M

EDhttp://jcm

.asm.org/

Downloaded from

The five isolates of B. mallei had identical allelic profiles(ST40) and clustered with the B. pseudomallei isolates; for sixof the seven MLST loci, the alleles in B. mallei were also foundwithin B. pseudomallei isolates. The allele at the other locus(narK-18) was not found in any of the B. pseudomallei isolates,but it differed at only a single nucleotide site from one of themost common of the alleles in the latter species (narK-1).Inspection of the incomplete genome of B. mallei ATCC

23344, which was recovered in 1942 from a horse in China andwhich is being sequenced by The Institute for Genome Re-search (http://www.tigr.org/), showed that this strain also hadan allelic profile identical to those of the other five B. malleistrains.

A total of 128 isolates were assigned to the species B.pseudomallei by MLST, and these were resolved into 71 STs.Among the B. pseudomallei isolates, there were 37 isolates

FIG. 1. Variable sites within the alleles at the seven MLST loci. The sequences of all of the alleles at each locus that are represented amongthe 147 Burkholderia isolates are shown. Only the variable sites are shown, and these are numbered in vertical format. For allele 1 at each locus,the nucleotide present at each variable site is shown. For other alleles, only those sites where the nucleotides differ from those in allele 1 are shown;sites that have the same nucleotide as that in allele 1 are shown by a dot. The alleles in normal font are from B. pseudomallei, those in boldfaceare from B. thailandensis, and the final allele at each locus (in italics) is from the Oklahoma isolates.

VOL. 41, 2003 MLST SCHEME FOR B. PSEUDOMALLEI 2071

on July 8, 2015 by MAHIDO

L UNIV FAC OF M

EDhttp://jcm

.asm.org/

Downloaded from

Variation of Allelic Number in Each Gene

The five isolates of B. mallei had identical allelic profiles(ST40) and clustered with the B. pseudomallei isolates; for sixof the seven MLST loci, the alleles in B. mallei were also foundwithin B. pseudomallei isolates. The allele at the other locus(narK-18) was not found in any of the B. pseudomallei isolates,but it differed at only a single nucleotide site from one of themost common of the alleles in the latter species (narK-1).Inspection of the incomplete genome of B. mallei ATCC

23344, which was recovered in 1942 from a horse in China andwhich is being sequenced by The Institute for Genome Re-search (http://www.tigr.org/), showed that this strain also hadan allelic profile identical to those of the other five B. malleistrains.

A total of 128 isolates were assigned to the species B.pseudomallei by MLST, and these were resolved into 71 STs.Among the B. pseudomallei isolates, there were 37 isolates

FIG. 1. Variable sites within the alleles at the seven MLST loci. The sequences of all of the alleles at each locus that are represented amongthe 147 Burkholderia isolates are shown. Only the variable sites are shown, and these are numbered in vertical format. For allele 1 at each locus,the nucleotide present at each variable site is shown. For other alleles, only those sites where the nucleotides differ from those in allele 1 are shown;sites that have the same nucleotide as that in allele 1 are shown by a dot. The alleles in normal font are from B. pseudomallei, those in boldfaceare from B. thailandensis, and the final allele at each locus (in italics) is from the Oklahoma isolates.

VOL. 41, 2003 MLST SCHEME FOR B. PSEUDOMALLEI 2071

on July 8, 2015 by MAHIDO

L UNIV FAC OF M

EDhttp://jcm

.asm.org/

Downloaded from

The five isolates of B. mallei had identical allelic profiles(ST40) and clustered with the B. pseudomallei isolates; for sixof the seven MLST loci, the alleles in B. mallei were also foundwithin B. pseudomallei isolates. The allele at the other locus(narK-18) was not found in any of the B. pseudomallei isolates,but it differed at only a single nucleotide site from one of themost common of the alleles in the latter species (narK-1).Inspection of the incomplete genome of B. mallei ATCC

23344, which was recovered in 1942 from a horse in China andwhich is being sequenced by The Institute for Genome Re-search (http://www.tigr.org/), showed that this strain also hadan allelic profile identical to those of the other five B. malleistrains.

A total of 128 isolates were assigned to the species B.pseudomallei by MLST, and these were resolved into 71 STs.Among the B. pseudomallei isolates, there were 37 isolates

FIG. 1. Variable sites within the alleles at the seven MLST loci. The sequences of all of the alleles at each locus that are represented amongthe 147 Burkholderia isolates are shown. Only the variable sites are shown, and these are numbered in vertical format. For allele 1 at each locus,the nucleotide present at each variable site is shown. For other alleles, only those sites where the nucleotides differ from those in allele 1 are shown;sites that have the same nucleotide as that in allele 1 are shown by a dot. The alleles in normal font are from B. pseudomallei, those in boldfaceare from B. thailandensis, and the final allele at each locus (in italics) is from the Oklahoma isolates.

VOL. 41, 2003 MLST SCHEME FOR B. PSEUDOMALLEI 2071

on July 8, 2015 by MAHIDO

L UNIV FAC OF M

EDhttp://jcm

.asm.org/

Downloaded from

Godoy D et al. (2003) J Clin Microbial 41:2068-2079.

at the time that this work was initiated were analyzed by usingthe BLASTX program, and housekeeping genes (those in-volved in essential metabolic processes) that were flanked byother housekeeping genes and that appeared to be devoid ofnearby genes that might be under diversifying selection fromthe host immune system or that might have been subject tohigh rates of horizontal gene transfer were selected.

Primers that allowed the amplification by PCR of approxi-mately 550-bp fragments from the candidate MLST loci weredesigned, and the same primers were used to sequence thefragments on each strand. For initial selection of MLST lociand primers, a set of 24 isolates that included 19 B. pseudoma-llei isolates and 5 B. thailandensis isolates was used, since prim-ers that amplified both of these species were required. Severalof the candidate housekeeping gene fragments gave good-quality sequences, but examination of the sequence tracesshowed that, for unknown reasons, there were two overlappingpeaks, suggesting two different nucleotides at a few sites. Sevengene fragments that did not show this phenomenon were se-lected for use in the final MLST scheme (Table 1). One pos-sible reason for the phenomenon described above would be thepresence of two extremely similar copies of some genes on theB. pseudomallei genome. The sequences of the seven MLSTloci selected were therefore compared to the recently com-pleted genome sequence of B. pseudomallei. Six of the se-quences gave the expected single perfect match, but the acegene fragment detected a gene on chromosome II that had69% nucleotide similarity (52% amino acid similarity), in ad-dition to the perfect match on chromosome I. The primersused for PCR amplification and sequencing of the ace frag-ment did not amplify this divergent homolog of the ace gene.

After completion of the genome sequence, the ndh fragmentwas found to include parts of two overlapping genes that en-code 282 bp from the end of the E subunit and 162 bp from thestart of the F subunit of NADH dehydrogenase I. The junctionsequence AAATGA includes the final codon of the upstreamgene (AAA; lysine) and its TGA termination codon, and thelast nucleotide of the lysine codon is the first nucleotide of theATG initiation codon of the downstream gene. The entire ndhfragment used in the MLST scheme therefore corresponds toa protein-coding region.

Following the completion of determination of the genomesequence of strain K96243, the locations of the seven house-keeping genes on the two circular chromosomes could be ex-amined. All seven genes were located on chromosome I andwere separated by at least 80 kb (Table 1).

Diversity and relatedness of alleles from the Burkholderiaisolates. The seven gene fragments were sequenced from the147 Burkholderia isolates. Figure 1 shows the polymorphic siteswithin the different alleles at the seven loci. At each locus therewere three distinct groups of alleles. One group of very similaralleles was found within those isolates assigned to the speciesB. pseudomallei, and a second group of similar alleles wasfound among isolates assigned to the species B. thailandensis.At each locus there was one allele that was divergent from bothof these groups of alleles; these divergent alleles were presentin only three identical isolates from Oklahoma, which tenta-tively (and it appears wrongly) had been assigned to the speciesB. pseudomallei (17, 26). The average divergence between thealleles of the B. pseudomallei and B. thailandensis isolates was3.2%, and the average divergences between the alleles in theOklahoma isolates and those of the B. pseudomallei and B.thailandensis isolates were 5.2 and 4.7%, respectively. Twoadditional isolates that had previously been assigned to thespecies B. pseudomallei (isolates 82172 and 1992/2572) pos-sessed alleles that were very similar to, although distinct from,the alleles found in B. thailandensis; these isolates were alsonot considered to be members of the species B. pseudomallei.

Among all Burkholderia isolates, the number of alleles perlocus varied from 7 to 19. Among the 128 isolates, between 4and 15 alleles were assigned to the species B. pseudomallei byMLST (average, 8.6), allowing about 3.4 million different al-lelic profiles to be distinguished within this species (Table 1).However, the level of sequence diversity within B. pseudomalleiwas low (average, 0.2%), and an average of 18% of all allelesat a locus differed at only a single nucleotide site.

Genetic diversity and relationships between Burkholderiaisolates. There were 81 different allelic profiles among the 147Burkholderia isolates (Table 2). Figure 2 shows a UPGMA treeobtained by using the matrix of pairwise differences in theallelic profiles of the isolates. All B. pseudomallei isolates weregrouped together and were resolved from isolates assigned tothe species B. thailandensis and from a few isolates that wereassigned as possibly belonging to the species B. pseudomallei.The alleles at all seven loci in the B. thailandensis isolates weredifferent from those at the seven loci in all B. pseudomalleiisolates. The three isolates from Oklahoma that had tentativelybeen assigned to the species B. pseudomallei were identical byMLST (ST81) but differed from all other isolates at all sevenloci. Similarly, the allelic profiles of the two isolates of ST73(isolates 82172 and 1992/2572; see above) differed from thoseof all other isolates at all loci.

TABLE 1. Properties of the loci used in the B. pseudomallei MLST scheme

Locus Gene function No. of allelesa No. ofvariable sitesa

Genomelocation (kb)

ace Acetyl coenzyme A reductase 4 7 1,780gltB Glutamate synthase 7 12 3,761gmhD ADP glycerol-mannoheptose epimerase 15 19 3,023lepA GTP-binding elongation factor 6 10 2,938lipA Lipoic acid synthetase 7 12 448narK Nitrite extrusion protein 14 18 2,784ndh NADH dehydrogenase 7 12 1,400

a Alleles and variable sites are those in the 128 isolates assigned to the species B. pseudomallei.

2070 GODOY ET AL. J. CLIN. MICROBIOL.

on July 8, 2015 by MAHIDO

L UNIV FAC OF M

EDhttp://jcm

.asm.org/

Downloaded from

Godoy D et al. (2003) J Clin Microbial 41:2068-2079.

Indel = Insertion / Deletion

A A G T C A – – – T T A G A COutgroup

A A G T C A G C C T T A G A C

A A G T C A – – – T T A G A C

Seq 1

Seq 2

Multiple Sequence Alignment (MSA) method for comparison of DNA and protein sequences

GITQRIAATTVDINKILKATEKLNNK-----GMKIPGLLFIDTPGHVAFSNMRARGGALADIAVLVIDIN------QTVESIDILKKFKTPFIIAANKIDLIPFFGITQRIAATTVDISRILKETEKLNTK-----GLKIPGLLFIDTPGHVAFSNMRARGGALADLAILVIDIN------QTVESIDILKKFKTPFIIAANKIDLIPFFRITQKIGATEISYNILEREIKTAFKNIPI----KIPGLLFIDTPGHVAFSNMRSLGGALADIAILVIDVN------QTIESIDILKKFKTPFIIAANKIDAVPYFGITQKIGATEIDKNTLETNIKNYFKNIQV----TIPGLLFIDTPGHVAFANMRSMGGALADIAILVIDVN------QTIESIDVLKKYKTPFIIAANKIDMIPYFGITQHIGASFLPREIIKKRCGPLYGKISGS-DVQVPGVLVIDTPGHEVFTNLRARGGSAADIAILVVDVN------QTSESLRVLQARKVPFVVALNKVDQIPGWMITQHIGMSFVPWQAVEKYAGPLVDRLKLRGKIWIPGFLFIDTPGHAAFSNLRKRGGSVADLAILVVDIT------QGVESLKLIQSRGVPFVIAANKLDRVYGWAITQHIGATAVPLDVISEIAGDLVD--PT--DFDLPGLLFIDTPGHHSFSTLRSRGGALADIAILVVDVN------QTLEAIDILKRTQTPFIVAANKIDTVPGWAITQHIGATAVPLDTISELAGQLVS--PE--DFDLPGLLFIDTPGHHSFSTLRSRGGALADIAILVVDVNDGFQPVQSYEALDILKRTQTPFIVAANKIDTVPGWGITQHIGASEIPINTIKKVSKDLLGLFKA--DLSIPGILVIDTPGHEAFTSLRKRGGALADIAILVVDINEGFKPVQTIEAINILKQCKTPFVVAANKVDRIPGWGITQHIGATEVPLDVIKQICKDIWKV-----EVKIPGLLFIDTPGHKAFTNLRRRGGALADLAILIVDINEGFKPVQTEEALSILRTFKTPFVVAANKIDRIPGWGITQHIGASIVPADVIEKIAEPLK--IPV--KLVIPGLLFIDTPGHELFSNLRRRGGSVADFAILVVDIMEGFKPVQTYEALELLKERRVPFLIAANKIDRIPGWEITQHVGASVVPASVLNKITEPLK--FPKL-VIEIPGLLFIDTPGHELFSNLRRRGGSVADMAILVVDVVEGFQPVQTIEALNILKEKRVPFIVAANKIDRLEGWGITQHIGASEIPLEVVKEICGPLLEQLDV--EITIPGLLFIDTPGHEAFTNLRRRGGALADIAILVIDIM------QTEEALRILRRYRTPFVVAANKVDRVPGWAITQHIGATEVPIDVIINKLGD--PRLRD--RFIVPGLLFIDTPGHHAFTTLRSRGGALADLAIVVVDIN------QTYESLQILKRFKTPFVVVANKIDRIGGWAITQHIGATEVPIDVIIDKLGD--PRLRD--RFMVPGLLFIDTPGHHAFTTLRSRGGALADLAIVVVDIN------QTYESLQILKRFKTPFVVVANKIDRIGGW

conserved block conserved blockconserved block

indel region indel region

4/22/14, 4:50 PMSeqFIRE

Page 1 of 1http://www.seqfire.org/

Home IndelRegions

ConservedBlocks

Download Help Contact

Sequence Feature and Indel Region Extractor (version 1.0.1)

CitationIf you use this server or standalone SeqFIRE, cite the following:

Ajawatanawong P., Atkinson G.C., Watson-Haigh N.S., MacKenzie B. and Baldauf S.L. (2012)SeqFIRE: a web application for automated extraction of indel regions and conserved blocksfrom protein multiple sequence alignments. Nucleic Acids Res., 40, W340-W347.

HOME | TOP

© Copyright 2011 by SeqFIRE Development Team.

About SeqFIRESeqFIRE is a program for extracting regions of interest from a mulitple sequence alignment. Theprogram can search for and extract regions that contain insertions and deletions (indels), andoutput details of the indel, as well as binary character matrix of conserved simple indels for usein phylogenetic analysis. SeqFIRE can also extract blocks of conserved columns from a sequencealignment, and output these alignments in proper format for phylogenetic analysis.

Click on the feature you wish to extract... identification & extraction of protein indel and conserved block regions

user-friendly web application

powerful standalone version

standard Python language (core program)

seqFIREprep for high-throughput analysis

easy user manual

www.seqfire.org

SeqFIRE

SeqFIRE Statistics number of visitors in the last two years

0"

20"

40"

60"

80"

100"

120"

140"

160"

Feb+13"

Mar+13"

Apr+13"

May+13"

Jun+13"

Jul+13"

Aug+13"

Sep+13"

Oct+13"

Nov+13"

Dec+13"

Jan+14"

Feb+14"

Mar+14"

Apr+14"

May+14"

Jun+14"

Jul+14"

Aug+14"

Sep+14"

Oct+14"

Nov+14"

Dec+14"

Jan+15"

Nu

mbe

r of

Vis

its

Months

all hits

unique visitors

SeqFIRE Statistics accessed from…

Sawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 1 | A maximum likelihood phylogenetic tree of thegenome sequenced members of the genus Burkholderia basedupon concatenated sequences of 21 conserved proteins. The treewas rooted using Cupriavidus necator N-1, Bordetella pertussis

Tohama I, and Neisseria meningitides MC58. Bootstrap analysisscores are indicated for each node. The major Burkholderia clades(Clades I and II) and their main sub-clades are indicated bybrackets.

relevant group within the Burkholderia. All species within thisclade are potentially pathogenic to human, animals, or plants andmost have been isolated from clinical human samples (Simpsonet al., 1994; Mahenthiralingam et al., 2002, 2005; Biddick et al.,2003; O’Carroll et al., 2003). One example of a CSI that is specificto the Clade I Burkholderia is shown in Figure 3A. In this case, aone amino acid deletion is present in a highly conserved region

of a periplasmic amino acid-binding protein. The indel is flankedon both sides by highly conserved regions indicating that itis not the result of alignment artifacts and that it is a reliablegenetic characteristic. This CSI is present in all of the sequencedmembers of the Clade I Burkholderia, but absent in all otherbacterial homologs of this protein. Our work has identified 5additional CSIs in other widely distributed proteins that are

www.frontiersin.org December 2014 | Volume 5 | Article 429 | 5

Phylogenomic of Burkholderia spp.

Sawana A et al. (2014) Frontiers in Genet 5:1-22.

Indel Specific for Burkholderia Clade ISawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 3 | Partial sequence alignments of (A) a periplasmic aminoacid-binding protein showing a 1 amino acid deletion identified in allmembers of Clade I of the genus Burkholderia (B) a dehydrogenaseshowing a 1 amino acid insertion (boxed) identified only in members ofClade II of the genus Burkholderia. These CSIs were not found in thesequence homologs of these proteins from any other sequenced bacteria. Ineach case, sequence information for a Burkholderia species and a limitednumber other bacteria are shown, but unless otherwise indicated, similar

CSIs were detected in all members of the indicated group and not detectedin any other bacterial species in the top 250 BLAST hits. The dashes (–) in thealignments indicate identity with the residue in the top sequence. GenBankidentification (GI) numbers for each sequence are indicated in the secondcolumn. Sequence information for other CSIs specific to the members ofClade I and Clade II of the genus Burkholderia are presented in SupplementalFigures 1–5 and Supplemental Figure 6, respectively, and their characteristicsare summarized in Table 2.

www.frontiersin.org December 2014 | Volume 5 | Article 429 | 7

Sawana A et al. (2014) Frontiers in Genet 5:1-22.

Indel Specific for Burkholderia Clade II

Sawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 3 | Partial sequence alignments of (A) a periplasmic aminoacid-binding protein showing a 1 amino acid deletion identified in allmembers of Clade I of the genus Burkholderia (B) a dehydrogenaseshowing a 1 amino acid insertion (boxed) identified only in members ofClade II of the genus Burkholderia. These CSIs were not found in thesequence homologs of these proteins from any other sequenced bacteria. Ineach case, sequence information for a Burkholderia species and a limitednumber other bacteria are shown, but unless otherwise indicated, similar

CSIs were detected in all members of the indicated group and not detectedin any other bacterial species in the top 250 BLAST hits. The dashes (–) in thealignments indicate identity with the residue in the top sequence. GenBankidentification (GI) numbers for each sequence are indicated in the secondcolumn. Sequence information for other CSIs specific to the members ofClade I and Clade II of the genus Burkholderia are presented in SupplementalFigures 1–5 and Supplemental Figure 6, respectively, and their characteristicsare summarized in Table 2.

www.frontiersin.org December 2014 | Volume 5 | Article 429 | 7

Sawana A et al. (2014) Frontiers in Genet 5:1-22.

Indel Specific for Burkholderia Clade IaSawana et al. Molecular signatures distinguishing Burkholderia species

FIGURE 4 | Continued

www.frontiersin.org December 2014 | Volume 5 | Article 429 | 9

Sawana A et al. (2014) Frontiers in Genet 5:1-22.