Sequencing the World of Possibilities for Energy & Environment Annotation: function prediction...

Preview:

Citation preview

Sequencing the World of Possibilities for Energy & Environment

Annotation: function prediction andmetabolic reconstruction

Thanos LykidisGenome Biology Program

DOE-Joint Genome Institute

alykidis@lbl.gov

Sequencing the World of Possibilities for Energy & Environment

Two main goals of genome analysis:

• Evolutionary analysis– How does an organism compare to the rest?

• Metabolic reconstruction– What can an organism do and how?

Sequencing the World of Possibilities for Energy & Environment

Metabolic reconstruction

• Predict the biochemistry and physiology of an organism based on its genome sequence

• Explain known biochemical and physiological properties

Sequencing the World of Possibilities for Energy & Environment

To do metabolic reconstruction we need to “annotate” the genome:

• Find the genes

• Understand (predict) what these genes are doing

Sequencing the World of Possibilities for Energy & Environment

The “same-gene” problem

Sequencing the World of Possibilities for Energy & Environment

Metabolic reconstruction-Gene function

• Experiment– enzyme assays– mutants

• Computation– sequence comparison

• BLAST, phylogenomics protein family (Pfam, COG, InterPro)

– chromosomal context – fusion

Sequencing the World of Possibilities for Energy & Environment

Similarity-based annotation

Sequencing the World of Possibilities for Energy & Environment

Similarity-based annotation

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

The dgk example

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Extensive distribution of the dgk protein family

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Flow chart of the reconstruction process

Gene Annotation Reaction Pathway

Sequencing the World of Possibilities for Energy & Environment

Treponema pallidum is an uncultivated pathogenic bacterium.

Fitzgerald TJ et al, J. Bacteriol 130:1333 1977.

TP0671

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

A, PIS bacterial

B, PIS eukaryotic

C, CLS eukaryotic

D, PGS

E, PSS

F, PCSG, unknownH, CPT/EPTI, unknown

Sequencing the World of Possibilities for Energy & Environment

670

99

86

Group H contains eukaryotic CPT/EPT

A_aeolicusC_reinhardtiiC_intestinalisD_melanogasterH_sapiensH_sapiensC_elegansC_elegans

A_thalianaA_thalianaD_melanogasterD_melanogasterC_intestinalis

C_intestinalisH_sapiens

S_cerevisiaeS_cerevisiae

T_denticolaT_pallidumN_aromaticivoransS_coelicolorS_avermitilis

Sequencing the World of Possibilities for Energy & Environment

Based on the BLAST hits we get a hint that TP0671 is a CEPT

+

CDP-Cho + DAG PtdCho

CDP-Etn DAG PtdEtn

CPT

EPT

Eukarya

Eukarya

Sequencing the World of Possibilities for Energy & Environment

A functional prediction has to make sense in the context of metabolism

Sequencing the World of Possibilities for Energy & Environment

Pathway

What is a pathway?

A sequence of reactions transforming one metabolite to another

cholinePhosphocholine

CDP-choline

Phosphatidylcholine

Cholinekinase

Phosphoholinecytidylyltransferase

Phosphatidyltransferase

Everything should come together

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

All genes of the pathway are present

cholinePhosphocholine

CDP-choline

Phosphatidylcholine

Cholinekinase

Phosphoholinecytidylyltransferase Phosphatidyltransferase

Sequencing the World of Possibilities for Energy & Environment

Reconstruction of phospholipid biosynthesis in Treponema pallidum

CDP-DAG

PtdSer

PtdEtn

PtdGlc

CL

PSS PGS

CLS

PtdOH DAG PtdChoPtdEtn

PSDCho, Etn

P-Cho, P-Etn

CDP-Cho, CDP-Etn

TP0107

TP0107

Sequencing the World of Possibilities for Energy & Environment

Working with no similarity

A

B

C

Enzyme 1

Enzyme x

Sequencing the World of Possibilities for Energy & Environment

The plsX-plsY pathway

Sequencing the World of Possibilities for Energy & Environment

Scoring phylogenetic profile similarity

Sequencing the World of Possibilities for Energy & Environment

Scoring phylogenetic profile similarity

Sequencing the World of Possibilities for Energy & Environment

Scoring phylogenetic profile similarity

Sequencing the World of Possibilities for Energy & Environment

Clustering of fatty acid biosynthesis genes

Acetyl-CoA

Malonyl-CoA

Malonyl-ACP

-ketoacyl-ACP

-hydrxyacyl-ACP

trans-2-enoyl-ACP

(s)-acyl-ACP

fabZ

accA

accD

accC

accB

fabF

fabG

fabD

cis-2-enoyl-ACP

(u)-acyl-ACP

fabK

acpP

fabH

HTH

fabM

acc

fabD

fabHfabF

fabZ

fabI

fabG

cis-2-enoyl-ACP (u)-acyl-ACP

fabM

fabA

fabK

S. pneumoniae

Sequencing the World of Possibilities for Energy & Environment

Current Status of annotation

~ 50-80% precise, accurate prediction

~ 10-30% “twilight zone” predictions

~ 10-30% genome specific genes

Sequencing the World of Possibilities for Energy & Environment

Metabolic reconstruction:Inferring physiology from sequence

Recommended