Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinformatics: from...

Preview:

DESCRIPTION

Talk as part of http://www.iscb.org/ismb2012-program/ismb2012-scs"Networks in genomics and bioinformatics: from phylogeny to Twitter"

Citation preview

Networks in genomics and bioinformatics: from phylogeny to Twitter

ISCB2012July 12, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Friday, July 13, 12

Networks in genomics and bioinformatics: from phylogeny to Twitter

ISCB2012July 12, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Friday, July 13, 12

A meandering path and lessons “learned”

ISCB2012July 12, 2012

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Friday, July 13, 12

Friday, July 13, 12

Social Networking in Science

Friday, July 13, 12

Bacterial evolve

Friday, July 13, 12

Friday, July 13, 12

Phylogenomics of Novelty

Friday, July 13, 12

Phylogenomics of Novelty

Friday, July 13, 12

Origin of New Functions and

Processes

Phylogenomics of Novelty

Friday, July 13, 12

Origin of New Functions and

Processes

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

Friday, July 13, 12

Origin of New Functions and

Processes

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

Friday, July 13, 12

Origin of New Functions and

Processes

Genome Dynamics

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

Friday, July 13, 12

Origin of New Functions and

Processes

Genome Dynamics

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

•Evolvability•Repair and recombination processes•Intragenomic variation

Friday, July 13, 12

Origin of New Functions and

Processes

Genome Dynamics

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

•Evolvability•Repair and recombination processes•Intragenomic variation

Friday, July 13, 12

Origin of New Functions and

Processes

Species Evolution

Genome Dynamics

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

•Evolvability•Repair and recombination processes•Intragenomic variation

Friday, July 13, 12

Origin of New Functions and

Processes

Species Evolution

Genome Dynamics

Phylogenomics of Novelty

•New genes•Changes in old genes•Changes in pathways

•Phylogenetic history•Vertical vs. horizontal descent•Needed to track gain/loss of processes, infer convergence

•Evolvability•Repair and recombination processes•Intragenomic variation

Friday, July 13, 12

Undergrad Lesson 1:Be prepared for random events

• Gould’s class b/c planned on not majoring in Biology

• RMBL via backpacking trip• Geology library job w/ Nabokov collection

b/c went to wrong building• Discovering Colleen Cavanaugh’s lab via

street encounter

Friday, July 13, 12

Undergrad Lesson 2: Phylogeny Matters

• “MacClade”• Phylogenetic ecology• Phylotyping

Friday, July 13, 12

Phylogeny Matters

Eisen et al. 1992

Friday, July 13, 12

Grad school lesson I: find right people to work with

• Went to work on butterfly population biology and phylogeny

• Advisor and I did not see eye to eye• Despite great subject for me (combined

phylogeny, molecular evolution, RMBL, etc), chose not to join lab

• Did many rotations …• Picked final lab in part b/c advisor was right

match

Friday, July 13, 12

Grad school lesson II:never too late to change

• Wanted to combine DNA repair studies and molecular evolution

• I: Thymineless death• II: Adaptive mutation• III: Repair in archaea

Friday, July 13, 12

Friday, July 13, 12

Grad school lesson II:never too late to change

• Wanted to combine DNA repair studies and molecular evolution

• I: Thymineless death• II: Adaptive mutation• III: Repair in archaea• IV: Bioinformatics and genome analysis …

Friday, July 13, 12

Grad school lesson III:Get others to do your work

• Interested in RecA structure function relationships

• Using phylogeny to look for correlated substitutions in RecA structure, like done with rRNA

• But not enough sequences …

Friday, July 13, 12

Friday, July 13, 12

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004Friday, July 13, 12

Grad school lesson IV:Stealing is good

• Phylogenetic perspective in bioinformatics missing

Friday, July 13, 12

“Nothing in biology makes senseexcept in the light of evolution.”

T. H. Dobzhansky (1973)

Friday, July 13, 12

Evolutionary Perspective and Comparative Biology

• Comparative biology is the analysis of differences and similarities between species.

• An evolutionary perspective is useful in such studies because this allows one to focus not just on the levels and degrees of similarity or difference but on how and why similarities and differences came to be.

Friday, July 13, 12

Phylogenomics

• Lots of sequences being produced with no functions associated with them

• Much debate in community about how to predict functions

Friday, July 13, 12

Predicting Function

• Identification of motifs• Homology/similarity based methods

• Highest hit• Top hits• Clusters of orthologous groups• HMM models• Structural threading and modeling• Evolutionary reconstructions

Friday, July 13, 12

Phylogeny Matters

Eisen et al. 1992

Friday, July 13, 12

Evolutionary Functional Prediction

1 2 3 4 5 6

3 5

3

1A 2A 3A 1B 2B 3B

2A 1B

1

1 2

2

2 31

1A 3A

1A 2A 3A

1A 2A 3A

4 6

4 5 6

4 5 6

2B 3B

1B 2B 3B

1B 2B 3B

1A3A

1B 2B3B

12 4

62A

2A

53

5

EXAMPLE BMETHOD

Duplication?

Duplication?

IDENTIFY HOMOLOGS

OVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

ALIGN SEQUENCES

CALCULATE GENE TREE

CHOOSE GENE(S) OF INTEREST

Species 3Species 1 Species 2

ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

EXAMPLE A

Duplication?

Duplication

Ambiguous

Based on Eisen, 1998 Genome Res 8: 163-167.

Friday, July 13, 12

Similarity ≠ Relatedness

Friday, July 13, 12

Evolutionary Rate Variation

Friday, July 13, 12

Phylogenetic Prediction of Function

• Many powerful and automated similarity based methods for assigning genes to protein families• COGs• PFAM HMM searches

• Some limitations of similarity based methods can be overcome by phylogenetic approaches

• Automated methods now available• Sean Eddy• Steven Brenner• Kimmen Sjölander

• But …

Friday, July 13, 12

Grad school lesson V:Teaching helps you learn

Friday, July 13, 12

Grad school lesson VI:There are no career rules

Friday, July 13, 12

Career Lesson I:Build on what you know

• Phylogenetic approaches to genomics• Genomics of endosymbionts• Genomic studies of communities• Analysis of DNA repair genes in genome

sequences• Phylogenomics of halophilic archaea• GEBA• Phylogenetic metagenomics• ...

Friday, July 13, 12

Career Lesson II: Don’t Only Use What You Know

Friday, July 13, 12

What We Don’t Know Can Hurt Us

Friday, July 13, 12

D. radiodurans genome

Friday, July 13, 12

DNA Repair Genes in D. radiodurans

Process Genes in D. radiodurans

Nucleotide Excision Repair UvrABCD, UvrA2 Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,

MPG AP Endonuclease Xth Mismatch Excision Repair MutS, MutL Recombination Initiation Recombinase Migration and resolution

RecFJNRQ, SbcCD, RecD RecA RuvABC, RecG

Replication PolA, PolC, PolX, phage Pol Ligation DnlJ dNTP pools, cleanup MutTs, RRase Other LexA, RadA, HepA, UVDE, MutS2

Friday, July 13, 12

Problem ...

• List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes of the similar size

Friday, July 13, 12

Repair Studies in Different Species(via Medline searches as of 1998)

Humans 7028E. coli 3926S. cerevisiae 988Drosophila 387B. subtilits 284S. pombe 116Xenopus 56C. elegans 25A. thaliana 20Methanogens 16Haloferax 5Giardia 0

Friday, July 13, 12

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

~40 Phyla of Bacteria

Friday, July 13, 12

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Most DNA metabolism studies in two Phyla

Friday, July 13, 12

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Deinococcus is very distant from well studied groups

Friday, July 13, 12

-Ogt-RecFRQN-RuvC-Dut-SMS

-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC

-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI

-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT

-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC

-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC

-PhrII-Ogg

-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA

-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA

-AlkA-Xth-Rad25?

-AlkA-Rad25

-Nfo

-Ogt-Ung-Nfo-Dut-Lon

-Ung

-PhrII

-PhrI

Ecoli

Haein

Neigo He

lpy

Bacs

u

Strpy

Myc

ge

Myc

pn

Borbu Trep

a

Syns

p

Met

jn

Arcfu

Met

th

Human

Yeas

t

BACTERIA ARCHAEA EUKARYOTES

from mitochondria

+Ada+MutH+SbcB

dPhr

+TagI?+Fpg

+UvrABCD+Mfd

+RecFJNOR+RuvABC

+RecG+LigI

+LexA+SSB

+PriA+Dut?

+Rus+UmuD+Nei?

+RecEtRecT?

+Vsr+RecBCD?

+RFAs+TFIIH

+Rad4,10,14,16,23,26+CSA

+Rad52,53,54+DNA-PK, Ku

dSNF2dMutSdMutLdRecA

+Rad1+Rad2

+Rad25?+Ogg+LigII

+Ung?+SSB,

+Dut?

+PhrI, PhrII+Ogt

+Ung, AlkA, MutY-Nth+AlkA

+Xth, Nfo?+MutLS?

+SbcCD+RecA

+UmuC+MutT

+LondMutSI/MutSII

dRecA/SMSdPhrI/PhrII

+Sprt3MG

+Rad7+CCE1

+P53dRecQ

dRad23+MAG?

-PhrII-RuvC

tRad25

+TagI?

+RecT

tUvrABCD

tTagI ?

Gain and Loss of Repair Genes

Eisen and Hanawalt, 1999 Mut Res 435: 171-213

Friday, July 13, 12

Solution - Experiments

Friday, July 13, 12

What We Don’t Know Can Hurt Us

Friday, July 13, 12

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Friday, July 13, 12

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

As of 2002

Based on Hugenholtz, 2002

Friday, July 13, 12

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Friday, July 13, 12

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Friday, July 13, 12

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Most genomes from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Friday, July 13, 12

Friday, July 13, 12

GEBA

http://www.jgi.doe.gov/programs/GEBA/pilot.html

Friday, July 13, 12

rRNA Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.

Based on tree from Pace 1997 Science 276:734-740

Archaea

Eukaryotes

Bacteria

Friday, July 13, 12

rRNA Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.

Based on tree from Pace 1997 Science 276:734-740

Archaea

Eukaryotes

Bacteria

??????

Wu et al. (2011) PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011

Friday, July 13, 12

????

Phage

Phage

????

Thaumarchaeot

Friday, July 13, 12

56

Number of SAGs from Candidate Phyla

OD

1

OP

11

OP

3

SA

R4

06

Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -

Sample collections at 4 additional sites are underway.

Phil Hugenholtz

GEBA uncultured

Friday, July 13, 12

Uncharacterized genes

Friday, July 13, 12

Non homology functional

• Many genes have homologs in other species but no homologs have ever been studied experimentally

• Non-homology methods can make functional predictions for these

Friday, July 13, 12

Phylogenetic profiling basis

• Microbial genes are lost rapidly when not maintained by selection

• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire

pathways/processes• Thus might be able to use correlated

presence/absence information to identify genes with similar functions

Friday, July 13, 12

Non-Homology Predictions: Phylogenetic Profiling

• Step 1: Search all genes in organisms of interest against all other genomes

• Ask: Yes or No, is each gene found in each other species

• Cluster genes by distribution patterns (profiles)

Friday, July 13, 12

Carboxydothermus hydrogenoformans

• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon

Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005

PLoS Genetics 1: e65. )

Friday, July 13, 12

PG Profiling Works Better Using Orthology

Friday, July 13, 12

PG Profiling Works Better Using Independent Contrasts

Friday, July 13, 12

Career Lesson III: Networks Matter

Friday, July 13, 12

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes

• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Friday, July 13, 12

Metagenomics

Friday, July 13, 12

Binning challenge

Friday, July 13, 12

AB

C

��

�� �

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

� �

��

� �

��

��

� �

��

��

� �

� �

� �

��

��

� ��

��

��

��

��

��

��

��

��

��

� �

��

��

� �

��

��

� �

��

��

��

��

��

��

��

� �

��

��

���

��

��

� �

��

��

��

� ��

��

� �

��

��

� �

� �� �

� �

��

��

��

��

���

� �

��

� �

��

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

� �

��

�� �

��

��

� �

��

��

��

��

��

��

��

��

�� �

��

��

��

���

��

��

��

��

��

�� �

�� �

��

��

��

��

��

�� �

��

� ��

� �

��

��

��

� �

��

� �

��

� �

��

��

��

��

��

� �

��

��

��

� �

��

��

��

��

��

��

��

��

��

� �

��

��

��

��

��

� �

��

Sharpton et al. submitted

Friday, July 13, 12

Career Lesson IV: Openness Helps

Friday, July 13, 12

Recommended