5/22/2017
Phylogenetic Marker COGs
To characterize taxonomic composition and phylogenetic diversity of metagenome samples often
universal markers such as 16S rRNA genes of bacteria and archaea can be used. However, there are
known problems using 16S rRNA marker genes due to biases introduced by copy-number variations,
amplification efficiency, inconsistencies when targeting different regions of this gene, and problems
with accurately and consistently delineating prokaryotic species.
As an alternative method, single-copy phylogenetic marker genes are ideal candidates for taxonomic
profiling of environmental samples. They can be either universal or clade specific that are both present
as single copies in most genomes. They are also rarely subject of horizontal gene transfer. Thus they can
provide prokaryotic species boundaries at higher resolution than 16S rDNA (1,2) to estimate relative
abundances of known and currently unknown microbial community members.
In IMG, candidate single-copy marker genes are identified as a subset of COGs which can be used for
metagenome phylogenetic profiling. JGI experts curated COG gene families that are known to appear
single-copies (or almost single-copies) across the reference genomes.
To perform phylogenetic marker analysis, first go to Compare Genomes menu and then select the Phylo
Marker COGs option under it. Please note that the tool can only apply to metagenomes with assembled
genes in IMG. If a user has genome selection in the Genome Cart, then only genomes in the cart will be
used; otherwise all metagenomes with assembled genes are available for selection.
Assume a user selects 6 acid mine metagenome datasets for analysis (Figure 1(i)). Result in Figure 1(ii)
shows single-copy or almost single-copy COG genes and estimate upper bounds of gene counts. The
user can click on an estimate count to view the actual gene list (see Figure 1(iii)).
Figure 1. Phylogenetic Marker COG Analysis.
Users can also select a particular COG ID in Figure 1(ii) and click the Go button to view the complete
gene list (limited to maximum gene count in MyIMG preference). For example, Figure 2(i) shows
COG0013 gene list. Users can select a subset of genes to view Jalview or multiple alignment display
(Figure 2(ii) and (iii), respectively).
Figure 2. Jalview and Multiple Assignment for COG0013.
Genes in Figure 2(i) can be selected to add to Gene Cart for further analysis.
References 1. Mende, D.R., Sunagawa, S., Zeller, G. & Bork, P. Nat. Methods 10, 881–884 (2013).
2. Sunagawa et al., Nature Methods 10,1196–1199 (2013) doi:10.1038/nmeth.2693