49
Comparative genomics Seminar series Fall 2006 Vera van Noort

Comparative genomics

Embed Size (px)

DESCRIPTION

Comparative genomics. Seminar series Fall 2006 Vera van Noort. Announcements. Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions - PowerPoint PPT Presentation

Citation preview

Comparative genomics

Seminar series Fall 2006

Vera van Noort

Announcements

Please ask questions !!! Take an assignment sheet with you

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Contents Genomics Functional associations

Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes

Gene Fusions Gene Neighborhood conservation Gene Presence/Absence

Comparing genomics data Horizontal comparative genomics

Conserved Co-expression Conserved Yeast-2-Hybrid

Vertical comparative genomics Evidence from multiple datasources Bayesian integration

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Sequencing of genes and genomes

http://www.ncbi.nlm.nih.gov/genbank

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Genome sequence of E. coliORIGIN

1 agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc

61 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg

121 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac

181 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt

241 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg

301 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt

361 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc

421 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg

481 gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa

541 cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg

601 caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt

661 agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa

721 atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc

781 gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct

841 gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca

901 ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac

961 tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Complete genomes

What do we need them for? What can we use them for?

Visualization

How do genes make a complete cell?

Functions

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical omparative genomics

For most genes in any genome we need function prediction

For many genes no function has been described

Even in a well-studied organism like E. coli only 43% have been characterized experimentally

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Protein function Predicting

protein function

Levels of description

Homology for determining molecular function

Other aspect of function?

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

“Beyond” homology and molecular function

Homology based function prediction works very well, yet:

a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes)

There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genes

Thus: predicting associations

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

PP

There are many types of functional associations (AKA functional interactions, interactions, functional links, functional relations) in molecular biology

Metabolic Pathways

Protein complexes

Transcription regulation

Cellular processes

Signaling

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Types of functional associations

Filling gaps in metabolic pathways Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Types of functional associations

PP

Transcription regulation Signaling pathways Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Types of functional associations

Protein complexes Cellular process:

“DNA repair”

“Apoptosis”

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Functionally associated proteins leave evolutionary traces of their relation in genomes

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Gene fusion If two genes in another organism are fused

into one polypeptide A very reliable indicator for physical

interaction

Fusion

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

How to detect gene fusions?

Compare predicted protein sequences with each other using Homology searches 1. Find orthologs, Match

two complete orthologs to unmatched genes. - orthology definition

2. Find two complete homologs matching your gene. - More complicated rules.

Fusion

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Orthology Common gene in

last common ancestor

Problems Duplication and loss Horizontal gene

transfer Methods

Bidirectional Best Hit Phylogenetic

reconstructions

A B

A1 A2 B1 B2

xx xx

Duplication

Speciation

A2B2

Duplication

Speciation

1

1 2

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Gene order evolves rapidly Positional

mapping of orthologs between two species

Conservation of gene neighbors

Conserved operons (transcriptional units of more than one gene)

Genomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Conserved gene neighborhoodGenomics

Functional associations Metabolic pathways Transcription

regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood

conservation Gene

Presence/Absence Comparing genomics

data Horizontal

comparative genomics

Vertical comparative genomics

Comparison to associations in pathways: conservation implies a functional association

1

10

100

1000

10000

0 3 6 9 12

15

18

21

24

27

30

co-occurrences in operons

nu

mb

er

of

CO

Gs

0

1

2

3

4

5

6

avera

ge m

eta

bo

lic d

ista

ncenumber of

COGS

averagemetabolicdistance

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Presence / absence of genes

L. Innocua (non-pathogen) L. monocytogenes (pathogen)

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Presence / absence of genes

Differences in gene contentDifferences in metabolic Capacities?

Shared genes: shared metabolic capacities?It does not make sense to have just one member of a pathway.

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Presence / absence of genes

L. Innocua (non-pathogen) L. monocytogenes (pathogen)

Pathogenicity genes?

Maybe not significant for one comparison, but maybe significant generalized over many genomes.

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Phylogenetic profilesGenomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Context methods for prediction of functional associations

Benchmark with Kegg metabolic pathways

Integration into one score

0 0.2 0.4 0.6 0.8 1Score

0

0.2

0.4

0.6

0.8

1

FusionGene OrderCo-occurrenceF

ract

ion

sam

e K

EG

G m

a p (

Si)

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

How can you use this?

STRING Database No skills needed

Parse data yourself Some programming skills needed Orthologs

COG database Genomes

Genbank Genome Atlas Database

http://string.embl.de//

http://www.ncbi.nlm.nih.gov/COG/

http://www.ncbi.nlm.nih.gov/Genbank

http://www.cbs.dtu.dk/services

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Correlated expression No operons in

eukaryotes Regulation by

the same transcription factors

Similarity in expression patterns in HTP expression data

High correlation between vectors

-1,5

-1

-0,5

0

0,5

1

1,5

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

N

i i

iN

iN

i i

i

YN

Y

XN

X

Nr

1

21

1

2 )(1

)(1

1

Correlated mRNA expression

Genes with correlated RNA expression often function in the same pathway.

Not reliable enough for function prediction.

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Orthology predictionGenomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  Total # of pairs

# of pairs > 0.6

Observed fraction > 0.6

Expected fraction > 0.6

Observed/Expected

Gene-pairs with an orthologous gene-pair > 0.6

Worm 18161 803 0.0442* 0.00379 12

Yeast 36548 1215 0.0332* 0.00216 15

Gene-pairs with a paralogous gene-pair > 0.6

Worm 207214

29031 0.1401* 0.00379 37

Yeast 38253 2167 0.0566* 0.00216 26

 

Low but significant levels of conservation of Low but significant levels of conservation of co-expressionco-expression(see Teichmann et al, TIBS 2002, Stuart et al., Science 2003)

van Noort et al, TIG, 2003Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Is the low level of conservation between Is the low level of conservation between S. S. cerevisiaecerevisiae and and C. elegansC. elegans of co-expression of co-expression (< 5%) “real”, reflecting evolution and (< 5%) “real”, reflecting evolution and species-specific interactions, or are we species-specific interactions, or are we just comparing noisy datasets ?just comparing noisy datasets ?

Species specific (idiosyncratic) coregulation:

“Efficient expression of the Saccharomyces cerevisiae

glycolytic gene ADH1 is dependent upon a cis-acting

regulatory element UASRPG found initially in genes

encoding ribosomal proteins.” Tornow and Santangelo,

Gene, 1990

Conservation of co-expressionGenomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

High level of conservation of co-High level of conservation of co-regulation after speciationregulation after speciation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

co-expression correlation (r)

freq

uenc

y di

stri

butio

n

worm orthologous gene pairs ofyeast gene pairs with r > 0.6and sharing TFBSall worm gene pairs

76 %

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Conservation between orthologous pairs or Conservation between orthologous pairs or paralogous pairs increases the likelihood of paralogous pairs increases the likelihood of functional interactionfunctional interaction

van Noort et al, Trends Genet 2003van Noort et al, Trends Genet 2003

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Phylogenetic distribution(all archaea + all eukaryotes)

Orthologous groups with that distribution:

RNAseL inhibitor

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Domain composition:

Conserved co-expression:

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Combined homology and conserved co-expression

A role for the RNase L inhibitor in rRNA processing

predictions directly to experimental groups (Ger Pruijn)

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

The Yeast-2-hybrid techniqueGenomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Conservation of physical interactions

Overlap between yeast and fly same size as overlap between two different yeast datasets

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Dataset Comparison

Protein interactions, both proteins in the other dataset

Conserved interactions

Fraction conserved interactions

Average fraction conserved interactions

Ito / UetzYeast vs. Yeast

858 / 697 201 23.4% / 28.8% 26.1%

Ito / GiotYeast vs. Fly

229 / 394 45 19.6% / 11.4% 15.5%

Uetz / GiotYeast vs. Fly

120 / 168 33 27.5% / 19.6% 23.5%

 

Physical interaction is reasonably well Physical interaction is reasonably well conserved between species (…..compared conserved between species (…..compared to the “conservation” within a species…)to the “conservation” within a species…)

Huynen et al, TIG, 2004

Conservation of protein-protein interaction between Conservation of protein-protein interaction between speciesspecies

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Conservation of protein-protein interaction Conservation of protein-protein interaction measured by yeast-2-hybrid increases the measured by yeast-2-hybrid increases the likelihood of interactionlikelihood of interaction

Comparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactionsComparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactions

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

GTPase XAB1/CG3704 hypothetical, GTPase YOR262/CG10222

XAB1 interacts with the DNA repair protein XPA1, XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s import in the nucleusinferred to be required for XPA1’s import in the nucleus..

A “new”, conserved interaction:

Fraction hypothetical proteins in conserved Y2H Fraction hypothetical proteins in conserved Y2H interactions relatively lowinteractions relatively low

Hypotheticals:Hypotheticals:In conserved interactionsIn conserved interactions 1313 5% 5% In complete genomeIn complete genome ~1600~1600 27%27%

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Two types of comparative genomics

Horizontal comparative genomics (HGT) Comparing orthologs Comparing genomics data between

species Vertical comparative genomics

Comparing genomics data within the same species

Integration of scores Bayesian methods

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Accuracy

Co

ver

age

purifiedcomplexes

TAP

yeast two-hybrid

two methods

three methods

PurifiedComplexesHMS-PCI

combinedevidence

mRNAco-expression

genomic context

syntheticlethality

fra

cti

on

of

refe

ren

ce

se

t c

ov

ere

d b

y d

ata

fraction of data confirmed by reference set

filtered data

raw data

parameter choices

Performance of genomic context compared Performance of genomic context compared to high-throughput interaction datato high-throughput interaction data

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Trusted co-regulated gene pairs have Trusted co-regulated gene pairs have similar functionssimilar functions

Correlation of co-regulation with functional interactions

Data set of gene pairs Percent same pathway

Number of gene pairs

r > 0.5 43 169,768 r > 0.6 52 65,430 r > 0.7 51 22,459 Sharing 1 TFBS 50 356,947 Sharing 2 TFBS 77 39,818 Sharing 1 TFBS and r > 0.3 86 19,386 Sharing 1 TFBS and r > 0.4 88 11,434 Sharing 1 TFBS and r > 0.5 90 6,687 Sharing 1 TFBS and r > 0.6 90 3,382 Sharing 1 TFBS and r > 0.7 86 1,156

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Conservation between different datasets: the Bayesian approach

overlap

Co-expressedmRNA 1

Co-expressedmRNA 2

Set 1 Set 2

- -

- +

+ -

+ +

Threshold > sets of ‘interactions’ of gene pairs Interactions present in

all datasets Interactions present in

specific combinations of datasets

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Conservation between different datasets: the Bayesian approach

Red circles

Historical:

+ ? ? ?

+ + ? ?

+ + + ?

+ + + +Blue diamonds

Consistent:

+ - - -

+ + - -

+ + + -

+ + + +

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Function prediction in P. falciparum PFI0895c

homologous to subunit 5 of translation elongation factor 3 (elF-3 epsilon), which interacts with the ribosome.

correlated expression with ribosomal proteins L27, L21e and Sa.

Annotation of PFI0895c as elF-3 epsilon most likely. PFI0555c

Co-expressed with two proteins that are involved in protein degradation

the aspartic proteinase and drug target PF14_0075 (plasmepsin IV) and

the ornithine aminotransferase MAL6P1.91 role for PFI0555c in protein degradation. Protein degradation important for host-parasite interaction.

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

Summary

Wealth of data to be explored Genomes Genomics data Comparisons within and between

species Study of evolution Prediction of gene functions

Genomics

Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes

Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics