Upload
aislinn-cremin
View
35
Download
0
Embed Size (px)
DESCRIPTION
IMG clusters – the hidden features. Sean Hooper Genome Biology Program JGI. Clusters work behind the scenes in IMG Used for Data compression Annotation assistance Grouping of similar functions Necessary for large datasets, e.g. metagenomics. Background. Example. - PowerPoint PPT Presentation
Citation preview
Sequencing the World of Possibilities for Energy & Environment
IMG clusters – the hidden features
Sean Hooper
Genome Biology Program
JGI
Sequencing the World of Possibilities for Energy & Environment
Background
• Clusters work behind the scenes in IMG
• Used for– Data compression– Annotation assistance– Grouping of similar functions– Necessary for large datasets, e.g.
metagenomics
Sequencing the World of Possibilities for Energy & Environment
Example
• Search for a gene annotated as putative or hypothetical
• Study the often overlooked clusters of genes in IMG
Sequencing the World of Possibilities for Energy & Environment
Putative ribolase carboxylase
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
Tatusev et al 1997
1997: 720 cogs
2003: 4873 cogs
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
MCL clustering on sequence
Sequencing the World of Possibilities for Energy & Environment
Nodes = IMG genes
Edges = in same cluster
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Alignment detail
Sequencing the World of Possibilities for Energy & Environment
Phylogeny
• How do these clusters relate to phylogeny?
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Conclusions
• Provide fast access to related proteins
• Ease analysis and annotation (but cannot replace experimental work)
• Reveal substructures in function and phylogeny
Sequencing the World of Possibilities for Energy & Environment
Acknowledgements
Genome Biology
K Mavrommatis
IJ Anderson
NC Kyrpides
A Pati
IMG crew
K Palappian
E Szeto
VK Markowitz
Chalmers, Sweden
D Dalevi
Sequencing the World of Possibilities for Energy & Environment
COAL demo
• Cluster overview of Archaea
• Spectral bipartitioning
• Integrate metadata (phenotype, phylogeny)