41
USING GENOMIC-BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer Sciences Tel Aviv University May 2009

U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

USING GENOMIC-BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE

Shiri Freilich

Eytan Ruppin, Roded SharanSchool of Computer Sciences Tel Aviv University

May 2009

Page 2: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

Species evolve to adapt to their environment.

Page 3: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

Environment/lifestyle Phenotype Genotype

Can we use the genotype to predict the lifestyle of a species?

Page 4: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

FROM GENOMIC INFORMATION TO PHENOTYPIC (METABOLIC) INFORMATION

Genomicinformation

Metabolicinformation

Gene

Enzyme

Page 5: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

GENOMIC ERA: SYSTEMATIC CONSTRUCTION OF HUNDREDS OF METABOLIC NETWORKS

Genomicinformation

Metabolicinformation

Hundreds offully sequencedbacterial species

Page 6: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

FROM METABOLIC INFORMATION TO ENVIRONMENTAL INFORMATION

Metabolicinformation

Environmentalinformation

Internal metabolite

External metabolite

Predicted natural metabolic environments

Page 7: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

D-GLUCOSE IS AN EXAMPLE OF AN EXTERNALMETABOLITE

Page 8: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

NEW APPROACHES ALLOW RECONSTRUCTION OF SPECIES’ METABOLIC-ENVIRONMENTS

From Borenstein et al, PNAS 2008

Based on the network topology, identifying the set of compounds that are exogenously acquired

Internal metabolite

External metabolite

Page 9: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

CONSTRUCTING PREDICTED ENVIRONMENTS ACROSS HUNDREDS OF SPECIES

Metabolicinformation

Environmentalinformation

Predicted metabolic environments

Page 10: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

SO WHAT DO WE HAVE AND WHAT IS IT GOOD FOR?

Metabolic networks

Environments

Species

Genomes

?

• Can we characterize the lifestyle of a species based onGenomic attributes?

• How does the structure of the metabolic network reflectadaptation to species’ lifestyle?

• Can we characterize ecological strategies based on genomic attributes?

• Can we characterize ecological communities based on genomic attributes?

• Why should we do it?

Page 11: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

FIRST QUESTION

Metabolic networks

Environments

Species

Genomes

?

• Can we characterize the lifestyle of the species based onGenomic attributes?

Can we predict, based on genomic knowledge, whether a speciesis a specialist or generalist?Can we estimate the range of environments it can inhabit?

Page 12: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

AND TO BE MORE SPECIFIC – WE SHOULD COUNT IN HOW MANY ENVIRONMENTS A SPECIES LIVES

Predicted metabolic environments

External/input metabolites

Internal/essential metabolites

Page 13: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

GENOMIC-BASE PREDICTED DIVERSITY CORRESPONDS WITH ECOLOGICAL KNOWLEDGE

Specific examples :

Pseudomonas aeruginosa

Desulfotalea psychrophila

Genomic- based predicted environments

√ √ √

√xx

NCBI annotations

Available systematic estimates/information for environmental variability

Fraction of reg. genes

Multiple

Specialized

High

{Low}

Beyond specific examples:strong correlation between metabolic-environment variability and established measures of environment variability

Page 14: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

WE NOW HAVE AN ENVIRONMENTAL MODEL

Viable

Not viable

Environmental viability matrix

Env 1 Env 2Env N

Spc N

Spc 1

Spc 2 Information on species

Information on environment

Page 15: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

SECOND QUESTION

Metabolic networks

Environments

Species

Genomes

?

• How does the structure of the metabolic network reflects adaptation to species’ lifestyle?

Studying essentialityof reactions acrosshundreds of bacterial-species across many simulated growth-environments.

Page 16: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

ESSENTIALITY OF ENZYMES ACROSS SPECIES AND ENVIRONMENTS

Predicted environments

√ √ √√xx

x √ √Enzy

mes

Predicted environments

√ √ √√xx

x √ √

Environment I:

α β

γ δ

ε ζ

η

α

γ δ

ε ζ

η

Environment II: Environment III:

β

γ δ

ε ζ

η

External metabolite

Intermediate product

Essential biomass product

Backed-up reaction

Essential reaction

Conditional-dependent reaction

Accuracy: 0.86 (E. coli ) and 0.85 (B. subtilis)

Page 17: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

Taking an enzymes’ point of view

High-throughput identification of essential reactions across species species-specific (or group-specific) essentiality looking for drug targets against reaction with a wide phylogenetic coverage.

This approach can be applied for highlighting essentiality in groups of medical, ecological or agricultural interest, e.g., human pathogens versus human commensals.

pathogens

Enzy

mes

Backed-up reaction

Essential reaction

Conditional-dependent reaction

Non pathogenic-bacteria

Page 18: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

One example:

10-Formyl-THF

fMet-tRNA

Initiation of protein synthesis

Purine synthesis

THF 5,10-MethenylTHFFTL MCH

Most commensals Few pathogens

Most commensals Most pathogens

Potential drug targetMCH: Methenyltetrahydrofolate cyclohydrolase FTL: Formyltetrahydrofolate synthetase

Page 19: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

TAKING A SPECIES POINT OF VIEW: ESTIMATING ROBUSTNESS OF METABOLIC NETWORK

4/7 Backed-up reaction

1/7 Essential reaction

2/7 Conditional-dependent reaction

α β

γ δ

ε ζ

η

The fraction of reaction across species

Environmental diversity

Fract

ion

Mean ~0.75

Descriptionofnetwork-robustness

E.coli: 0.78 (0.83 backed-up genes)

M. genitalium: 0.35 (0.2-0.45 backed-up genes)

Page 20: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

GENETIC ROBUSTNESS

What is it? The ability of a biological system to continue functioning following mutations

How robust are biological systems? Under laboratory conditions most genes are dispensable; dispensability depends on the experimental setting.

How can we explain robustness in evolutionary terms? The origin of robustness is under debate: Direct selection in favor of resistance to

mutations By product of the selection for other traits (e.g.,

increasing steady-state metabolic fluxes) Genetic robustness reflects environmental

robustness

Page 21: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

NETWORK ROBUSTNESS: ENVIRONMENTAL-DEPENDENT AND INDEPENDENT COMPONENTS

The fraction of reaction across species

Environmental diversity

Fract

ion

Correlation: 0.8

Correlation: 0.1

environmentally-dependent component component is strongly associated with environmental diversity (rho=0.8) and responsible for the robustness of no more than 20% of metabolic reactions over all species and environments modeled.

How can we explain the environmentally-independent component?

Page 22: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

ENVIRONMENTALLY-INDEPENDENT ROBUSTNESS IS ASSOCIATED WITH THE METABOLIC CAPACITIES

Obse

rved g

row

th r

ate

(lo

g)

Prediction for growth rate (log), based on network robustness

How can we explain the environmentally-independent component?The environmentally-independent component is associated (correlation=~0.6) with the metabolic capacities of a species -- higher robustness is observed in fast-growers or in organisms with an extensive production of secondary-metabolites.

Page 23: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

SECOND QUESTION

Metabolic networks

Environments

Species

Genomes

?

• How does the structure of the metabolic network reflectadaptation to species’ lifestyle?

The design of metabolic networks represents a species-specific adaptation to both its needs and its environment.

Page 24: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

THIRD QUESTION

Metabolic networks

Environments

Species

Genomes

?

• The structure of the metabolic network reflectsadaptation to species’ lifestyle. Can we characterize complexecological attributes based on genomic attributes?

Can we predict the level ofcompetition a species encounters in its natural environments andits rate of growth?

Page 25: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

ONCE AGAIN – OUR ENVIRONMENTAL MODEL

Viable

Not viable

Environmental viability matrix

Env 1 Env 2 Env N

Spc N

Spc 1

Spc 2 Information on species

Information on environment

Page 26: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

APPLYING THE ENVIRONMENTAL-MODEL FOR THE CHARACTERIZATION OF ECOLOGICAL ATTRIBUTES – COMPETITION

Viable

Not viable

Environmental Viability Matrix

Env 1 Env 2 Env 3

Spc 3

Spc 1

Spc 2

Spc 4

Env 4

Co-Habitation vector

Spc 3

Spc 1

Spc 2

{1,3,2}

{3}

{3,2}

Max-CHS

Spc 4 {1}

3

3

3

1

Environments populated by bacteria of an annotated lifestyleM

ean

leve

l of p

opul

atio

n

Population of environments are in agreement withecological knowledge

Page 27: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

DELINEATING ECOLOGICAL STRATEGIES FOR RATE OF GROWTH:

Environments diversity

Max

imal

co-

habi

tatio

n

Ecological diversity with intense

co-inhabitation, associated with

a typical fast rate of growth.

A specialized niche

with little co-inhabitation,

associated with a

typical slow rate of growth

Page 28: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

THIRD QUESTION

Metabolic networks

Environments

Species

Genomes

?

• Can we characterize ecological attributes based on genomic attributes?

The patterns observed suggests a universal principle where metabolic flexibility is associated with a need to grow fast, possibly in the face of competition.Beyond specific examples, the interplay between the environmental diversity – and maximal co habitationallows training a predictor for growth rate (ROC score of 0.75 )

Page 29: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

FOURTH QUESTION

Metabolic networks

Environments

Species

Genomes

?

• Can we characterize ecological Communities based on genomic attributes?

Characterization of pair-wise relationship between bacterial species to identify competitionand cooperation

Page 30: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

SPECIES DO NOT LIVE IN A VACUUM

Page 31: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

WHY SHOULD WE MODEL COMMUNITIES?

The composition of bacterial communities is a major factor in human health.

Variations in the identity and abundance of species affect its metabolic potential and hence have important medical implications.

Computational approaches can now be applied for the modeling of bacterial interactions.

The ultimate goal is to be able to manipulate bacterial communities to our advantage.

Page 32: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

FIRST STEP: MODELING PAIR-WISE INTERACTIONS

Environmental viability matrix

Env 1 Env 2 Env N

Spc N

Spc 1

Spc 2

Pairwise interactions data

Spc1 Spc2 Spc N

Spc N

Spc 1

Spc 2

New type of data:

Interaction (competition/cooperation)

No interaction

Viable

Not viable

Page 33: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

Environments populated by bacteria of an annotated lifestyle

Mea

n le

vel o

f pop

ulat

ion

LACK OF SYSTEMATIC KNOWLEDGE OF PAIR-WISE INTERACTIONS

Modelling lifestyle

Systematic knowledge for species-specific lifestyle

Modeling pairwise interactions

Env 1 Env 2 Env N

Spc N

Spc 1

Spc 2

3

1

2

Spc1 Spc2 Spc N

Spc N

Spc 1

Spc 2

Systematic knowledge for pairwise interactions

Metagenomic data?

Page 34: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

WHAT ARE METAGENOMIC DATA?

Page 35: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

IDENTIFYING “OUR” SPECIES IN ENVIRONMENTAL SAMPLES

Species represented by 16s rRNA

Spc 1

Spc 2

Gut

Spc 3

Environmental samples

Marine

Soil

BLAST

Page 36: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

CONSTRUCTING EXPERIMENTALLY-DRIVEN PAIRWISE INTERACTIONS DATA

Environmental samples

Gut Marine PM3

Spc 3

Spc 1

Spc 2

Spc 4

Spc 5

Spc 6

Environmentally-drivenDatabase of interactions

Spc 3

Spc 1

Spc 2

Spc 4

Spc 5

Spc 6

Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6

-134 species (including 47 species in the gut and 81 marine species)-~1200 interactions (limited to interactions within the gut and between marine species)

Page 37: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

PUBMED IS A LARGE AND COMPREHENSIVE DATA SOURCE

Müller & Mancuso, Plos ONE, 2008

Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. Co-occurrence between genes and proteins was shown to reveal functional interactions.

Page 38: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

CONSTRUCTING LITERATURE-DRIVEN PAIRWISE INTERACTIONS DATA

Papers in Pubmed

PM1 PM2 PM3

Spc 3

Spc 1

Spc 2

Spc 4

Spc 5

Spc 6

Co-occurrence basedDatabase of interactions

Spc 3

Spc 1

Spc 2

Spc 4

Spc 5

Spc 6

Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6

-All species are covered by the database (~400 species)-~6000 interactions (cut-offs vary in dependence with the statistical approach taken)

Page 39: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

CO-OCCURRENCE BASED INTERACTIONS DATA ARE IN AGREEMENT WITH ECOLOGICAL PROPERTIES

Nu

mb

er o

f co-a

ssocia

ted

p

artn

ers

Spc1 Spc2 Spc 3

Spc 3

Spc 1

Spc 2

2

1

0

Pairwise interaction data

Significant enrichment in experimentally-based

interactions

Spec. Obli. Aqua. Hoas. Mult. Soil

Strong correlationwith systematic annotations

of ecological diversity

Page 40: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

PAIRWISE INTERACTIONS OF GUT BACTERIA

Typical gut bacteria

Potential gut bacteria

Potential human-associatedbacteria

Other

Page 41: U SING GENOMIC - BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE Shiri Freilich Eytan Ruppin, Roded Sharan School of Computer

THANKS

Anat KreimerUri GophnaRoded SharanEytan Ruppin

Nir YosefIsaac MeilijsonElhanan BorensteinMoshe Mevarech