25
Sequencing the World of Possibilities for Energy & Environment IMG clusters – the hidden features Sean Hooper Genome Biology Program JGI

IMG clusters – the hidden features

Embed Size (px)

DESCRIPTION

IMG clusters – the hidden features. Sean Hooper Genome Biology Program JGI. Clusters work behind the scenes in IMG Used for Data compression Annotation assistance Grouping of similar functions Necessary for large datasets, e.g. metagenomics. Background. Example. - PowerPoint PPT Presentation

Citation preview

Page 1: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

IMG clusters – the hidden features

Sean Hooper

Genome Biology Program

JGI

Page 2: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Background

• Clusters work behind the scenes in IMG

• Used for– Data compression– Annotation assistance– Grouping of similar functions– Necessary for large datasets, e.g.

metagenomics

Page 3: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Example

• Search for a gene annotated as putative or hypothetical

• Study the often overlooked clusters of genes in IMG

Page 4: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Putative ribolase carboxylase

Page 5: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Page 6: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Tatusev et al 1997

1997: 720 cogs

2003: 4873 cogs

Page 7: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Page 8: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 9: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 10: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Page 11: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

MCL clustering on sequence

Page 12: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Nodes = IMG genes

Edges = in same cluster

Page 13: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 14: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 15: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 16: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 17: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Alignment detail

Page 18: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Phylogeny

• How do these clusters relate to phylogeny?

Page 19: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 20: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 21: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 22: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Page 23: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Conclusions

• Provide fast access to related proteins

• Ease analysis and annotation (but cannot replace experimental work)

• Reveal substructures in function and phylogeny

Page 24: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

Acknowledgements

Genome Biology

K Mavrommatis

IJ Anderson

NC Kyrpides

A Pati

IMG crew

K Palappian

E Szeto

VK Markowitz

Chalmers, Sweden

D Dalevi

Page 25: IMG clusters – the hidden features

Sequencing the World of Possibilities for Energy & Environment

COAL demo

• Cluster overview of Archaea

• Spectral bipartitioning

• Integrate metadata (phenotype, phylogeny)