View
4
Download
0
Category
Preview:
Citation preview
TIGRTIGRTIGR
More Questions Than Answers:
Comparative genomics, evolutionaryanalysis, and knowing what we do not
know about DNA repair.
Jonathan A. EisenSeptember 20, 2005
TIGRTIGRTIGR
More Questions Outline• Background• More questions
– I. Case study - H. pylori– II. New subfamilies and domain patterns– III. Recent duplications– IV. Missing pathways– V. Unusual patterns– VI. Even more questions
• Answers– Better informatics– Experiments
• The end
TIGRTIGRTIGR
Background
Genome sequencing andevolutionary analysis
TIGRTIGRTIGR
Fleischmann et al.1995
TIGRTIGRTIGR
Prokaryotic Genomes
From C. M. Fraser
TIGRTIGRTIGR
TIGRTIGRTIGR
Why Completeness is Important
• Improves characterization of genome features• Better comparative genomics• Presence/absence is less subjective• Missing sequence might be important (e.g.,
centromere)• Allows researchers to focus on biology not
sequencing• Facilitates large scale correlation studies• Controls for contamination
see Fraser et al. (2002). J. Bacteriol. 184: 6403-5
TIGRTIGRTIGR
Analysis of Complete Genomes
• Identification/prediction of genes• Characterization of gene features• Characterization of genome features• Prediction of gene function• Prediction of pathways• Integration with known biological data• Comparative genomics
TIGRTIGRTIGR
“Nothing in biology makes senseexcept in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGRTIGR
Evolutionary Perspective andComparative Biology
• Comparative biology is the analysis of differencesand similarities between species.
• An evolutionary perspective is useful in such studiesbecause this allows one to focus not just on thelevels and degrees of similarity or difference but onhow and why similarities and differences came tobe.
TIGRTIGRTIGR
More Questions I:A Case Study
The Unbearable Lightness ofHelicobacter pylori’s
DNA Repair Gene List
TIGRTIGRTIGR
TIGRTIGRTIGR Tomb et al. 1997
TIGRTIGRTIGR
H. pylori and MutS
• H. pylori has a MutS-like gene but not MutL
• Prior to this genome, all species that encoded aMutS homolog also encoded a MutL homolog
• Experimental studies have shown MutS and MutLalways work together in mismatch repair
• Problem: what do we conclude about H. pylorimismatch repair
See Eisen et al, 1997
TIGRTIGRTIGR
Two Prokaryotic MutS Subfamilies
5
1
2
34
*
E. coli
H. influenzae
N. gonorrhoaea
H. pylori
Syn. sp
B. subtilis
S. pyogenes
M. pneumoniae
M. genitalium
A. aeolicus
D. radiodurans
T.pallidum
B.burgdorferi
Syn. sp
B. subtilis
S. pyogenes
A. aeolicus
D. radiodurans
B. burgdorferi
MutS1MutS-I lineageMutS-II lineage
Species TreeGene loss
* Gene Duplications1-5 Gene Loss
A. B.
A. aeolicus
S pyogenes
B. subtilis
Syn. sp
D. radiodurans
MutS2
B.burgdorferi
Eisen, 1998, NAR 26: 4291-4300.
TIGRTIGRTIGR
H. pylori No MMR
TIGRTIGRTIGR
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWNFUNCTIONS ONTO TREE
INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A3A
1B 2B3B
ALIGN SEQUENCES
CALCULATE GENE TREE
12 4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
53
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Eisen, 1998, GenomeRes. 8: 163-167.
TIGRTIGRTIGR
More Questions II:
Long Lost Cousins
TIGRTIGRTIGR
Two UvrA Subfamilies• Some specieshave a secondUvrA-like gene
• Some results onUvrA2 in D.radiodurans
• Unclear whatUvrA2s do
Eisen and Wu, 2002, Theor. Pop. Biol. 61: 481-487.
TIGRTIGRTIGR
UvrA2UvrA2 S. coelicolor
DrrC S. peuceteus
UvrA2 D. radiodurans
Duplicationin UvrAfamily
UvrA1
UvrA H. influenzae
UvrA E. coli
UvrA N. gonorrhoaea
UvrA R. prowazekii
UvrA S. mutans
UvrA S. pyogenes
UvrA S. pneumoniae
UvrA B. subtilis
UvrA M. luteus
UvrA M. tuberculosis
UvrA M. hermoautotrophicum
UvrA H. pylori
UvrA C. jejuni
UvrA P. gingivalis
UvrA C. tepidum
uvra1 D. radiodurans
UvrA T. thermophilus
UvrA T. pallidum
UvrA B. burgdorefi
UvrA T. maritima
UvrA A. aeolicus
UvrA Synechocystis sp.
UvrA1
UvrA2
OppDFUUP
NodILivF
XylG
NrtDC
PstB
MDR
HlyB
TAP1
CFTR, SUR
A. ABC Transporters B. UvrA Subfamily
Evolution of UvrA Family
TIGRTIGRTIGRVan Houten et al, 2002,PNAS 99: 2581-2583.
UvrC vs. Cho• Cho in E. coli isinvolved in analternative NERpathway (seeMoolenar et al.2002)
• What do theother relatives ofCho do?
TIGRTIGRTIGR
Phr.S thypPHR E. coli
ORFA00965phr.neucr
Phr.TrichoPhr.Yeast
Phr.B firmphr.strpy
phr.halobaPHR STRGR
pCRY1.humaphr.mouse
phr2.humanphr2.mouse
phr.drosopphr3.Synsp ORF02295.Vibch
phr.neigoORF01792.Vibch
Phr.AdiantPhr2.AdianPhr3.Adianphr.tomatoCRY1 ARATH
phr.phycomCRY2 ARATHPHH1.arath
PHR1 SINALphr.chlamy
PHR ANANIphr.Synsp
PHR SYNY3 phr.Theth
Rh.caps
MTHF typeCPDPhotolyases
6-4Photolyases
BlueLightReceptors
8-HDF typePhotolyase
Three V. cholerae Phr Homologs
TIGRTIGRTIGR
AlkA Domain (O6-Me-G glycosylase)Ogt Domain (O6-Me-G alkyltransferase)Ada Domain (transcriptions regulator)
Ada E. coliAda H. inflOgt E. coliOgt H. inflOgt Gram+Ogt D. radio
AlkA Gram+AlkA E. coli
MGMT Euks
Eisen and Hanawalt, 1999, Mut. Res. 435: 171-213.
Alkylation Repair Genes
TIGRTIGRTIGR
More Questions III:
Does 1+1 = 1? Or 2?
TIGRTIGRTIGR Wu et al, 2004, PLOS Biology 2: 327-341
Wolbachia pipientis wMel
TIGRTIGRTIGR
MutL Duplication in Wolbachia
Eisen, unpublished.
TIGRTIGRTIGRWood et al., 2001, Science294: 2317-2323
Agrobacterium tumefaciens
TIGRTIGRTIGR
Chromosome1 DnaE, LigA, XthChromosome2 DnaE, LigAPlasmid1 DnaE, LigA, XthPlasmid2 DnaE?, LigA
Wood et al., 2001, Science 294: 2317-2323
Agrobacterium tumefaciens hasmultiple duplications
TIGRTIGRTIGR
Schizosaccharomyces pombeNeurospora crassaClostridium perfringensBacillus subtilisBacillus cereusBacillus anthracis UVDE2Bacillus haloduransBacillus anthracis UVDE1Deinococcus radioduransNostoc sp. PCC 7120
Duplication of UVDE in Anthrax
Eisen,unpublished.
TIGRTIGRTIGR
More Questions IV:
When a good repair pathwayis hard to find
• P. falciparum initialannotation showed thepresence of a DNA LigaseIV homolog but not a DNALigase I.
• Yet Ligase I has more of acore function in othereukaryotes, while Ligase IVis involved in NHEJ
P. falciparum DNA Repair
TIGRTIGRTIGR
Arabidopsis thalianaGP9651815gDrosophila melanogasterGP72929
Homo sapiensSPP49917DNL4 HUMANGallus gallusGP15778121dbjBAB6Xenopus laevisGP18029886gbAAL5
Candida albicansSPP52496DNLI CSaccharomyces cerevisiaeGP1151Schizosaccharomyces pombeGP700
Camelpox virusGP18483081gbAAL7Variola major virusGP439074gbA
Cowpox virusGP20153167gbAAM136Vaccinia virusGP2772802gbAAB96VIRUS vaccinia 9791118refNP 06Vaccinia virus strain Tian TanMonkeypox virusGP17529940gbAAL
Homo sapiensSPP49916DNL3 HUMANMus musculusGP1794221gbAAC5300
Xenopus laevisGP18029884gbAAL5lumpy skin disease virusGP1514
Swinepox virusGP18448623gbAAL6Myxoma virusGP6523988gbAAF1502Rabbit fibroma virusGP392838gb
Fowlpox virusGP453602embCAA828Drosophila melanogasterGP72996
Arabidopsis thalianaSPQ42572DNOryza sativaGP16905197gbAAL310
Crithidia fasciculataGP312384eCaenorhabditis elegansSPQ27474
Drosophila melanogasterGP72916Homo sapiensSPP18858DNL1 HUMANMus musculusSPP37913DNL1 MOUSERattus norvecusSPQ9JHY8DNL1 RAXenopus laevisSPP51892DNL1 XEN
Plasmodium falciparumGP1815859Schizosaccharomyces pombeSPP12Saccharomyces cerevisiaeSPP048
Aeropyrum pernixSPQ9YD18DNLI AAcidianus ambivalensSPQ02093DNSulfolobus solfataricusSPQ980TSulfolobus shibataeSPQ9P9K9DNLSulfolobus tokodaiiSPQ976G4DNLAquifex aeolicusGP2983805gbAACAquifex aeolicusSPO67398DNLI APyrobaculum aerophilumGP409906
uncultured crenarchaeote 74A4GThermoplasma acidophilumSPQ9HJThermoplasma volcaniumOMNINTL0
Methanosarcina acetivorans strArchaeoglobus fuldusSPO29632DN
A METAC 19916535gbAAM05952.1 DPyrococcus abyssiSPQ9V185DNLI Pyrococcus horikoshiiSPO59288DPyrococcus furiosusSPP56709DNLThermococcus kodakaraensisGP10Thermococcus fumicolansSPQ9HH0Methanopyrus kandleri AV19GP19
Methanococcus jannaschiiSPQ576Halobacterium sp.SPQ9HR35DNLI
Streptomyces coelicolorSPQ9FCBLymantria dispar nucleopolyhed
Ligase IV
Viral Ligases
Ligase I
Archaeal Ligases
DNA Ligase Tree
Eisen, unpublished
TIGRTIGRTIGR
Arabidopsis thaliana
Drosophila melanogaster
Homo sapiens
Gallus gallus
Xenopus laevis
Candida albicans
Saccharomyces cerevisiae
Schizosaccharomyces pombe
DNA Ligase IV SubTree
Eisen, unpublished
TIGRTIGRTIGR
Arabidopsis thalianaOryza sativaCrithidia fasciculataCaenorhabditis elegansDrosophila melanogasterHomo sapiensMus musculusRattus norvecusXenopus laevisPlasmodium falciparumSchizosaccharomyces pombeSaccharomyces cerevisiae
DNA Ligase I SubTree
Eisen, unpublished
TIGRTIGRTIGR
No NHEJ in P. falciparum?
• No Ligase IV
• None of the other “normal” NHEJ genes foreukaryotes
• Does it use only homologous recombinationor is there another NHEJ pathway?
See Gardner et al 2002
TIGRTIGRTIGR Wu et al, 2004, PLOS Biology 2: 327-341
Wolbachia pipientis wMel
TIGRTIGRTIGR
Wuet al.,2004
Wolbachia Evolves Rapidly
TIGRTIGRTIGR
Insect Symbiont Tree
Wu et al. In preparation
TIGRTIGRTIGR
Intracellular Organisms andEvolutionary Rate
• Many intracellular bacteria have higherevolutionary rates than free-living relatives
• Two main theories to explain this:– loss of DNA repair– altered population genetics (e.g., bottlenecks)
TIGRTIGRTIGR
Repair Category Gene Name
MMR mutS1, mutL1, mutL2NER uvrABCDBER fpg, mpg, nthAP Endo xthRec recA, ruvABC, recG,
recFJROther ligA, ssb, radA
Wolbachia Has Similar Repair Genesto More Slowly Evolving Species
Wu et al., 2004
TIGRTIGRTIGR
Insect Symbiont RepairBuchnera(APS)
Buchnera(BP)
Buchnera(SG)
Wigglesworthia Blochmannia Baumannia
Base Excision RepairUng gi15616802 gi27904674 - gi32491032 gi33519990 GEBCORF00107MutY gi15617145 gi27904974 gi21672797 - gi33519714 -Fpg - - - - - GEBCORF00321Nfo gi15616757 gi27904631 gi21672420 - - GEBCORF00528Xth - - - gi32491120 gi33519889 -
Nucleotide Excision RepairUvrD - - - gi32491021 - GEBCORF00274
Transcription-Repair Coupling FactorMfd gi15616905 gi27904769 - gi32490848 - -
Mismatch RepairMutL gi15617161 gi27904987 gi21672810 - - GEBCORF00079MutS gi15617028 gi27904861 gi21672681 - - GEBCORF00349MutH - gi27904531 - - - GEBCORF00038
8-oxo-dGTP HydrolysisMutT gi15616821 - gi21672482 - - -
Pyrimidine Dimer RepairPhr gi15616910 gi27904772 - gi32490931 - -
Homologous RecombinationRecG - - - - - GEBCORF00259RuvA - - - - - GEBCORF00452RuvB - - - - - GEBCORF00451RuvC - - - gi32490863 - GEBCORF00453RecB gi15617053 gi27904885 gi21672704 gi32491015 gi33519733 GEBCORF00043RecC gi15617052 gi27904884 gi21672703 gi32491014 gi33519731 GEBCORF00042RecD gi15617054 gi27904886 gi21672705 gi32491016 gi33519732 GEBCORF00044RecA - - - gi32490984 - GEBCORF00346RecJ - - - gi32491188 - GEBCORF00298RmuC - - - - gi33520059 GEBCORF00163
Wu et al. In preparation
TIGRTIGRTIGR
Insect Nutritional Symbiont Tree
++++-+-+--+-
RecAMutLS
Wu et al. In preparation
TIGRTIGRTIGR
Organelle Derived Genes in theA. thaliana Nuclear Genome
Pathway GenesMismatch Repair MutS2Base Excision Repair Fpg, Tag?Nucleotide ExcisionRepair
Mfd
Recombination RecA, RecGDirect Repair PhrOther Lon
Eisen and Britt, 2000
TIGRTIGRTIGR
More Questions V:
One of these things just doesnot belong here
TIGRTIGRTIGR
Other Unusual Patterns
• RecBCD but no RecA• Mfd but no UvrABCD• No Ung in thermophiles• Rad25 in some bacteria• No MSH4, 5 in many eukaryotes• No CSB in some eukaryotes• Tons of repair genes in Mimivirus
TIGRTIGRTIGR
Archaea
• NER– UvrABC in some– Rad1, Rad2, Rad25 etc in others– Some species with both
• MMR– Many species with MutS2 only– Some with multiple MutLs, MutS1s– Three MutSs in Halophiles
TIGRTIGRTIGR
Even More Questions?
You ain’t seen nothing yet
TIGRTIGRTIGRHugenholtz 2002
Biased Sampling of Genomes
• Most bacterial genomescome from three phyla
• Most phyla have nogenome sequences
• Even worse samplingfor eukaryotes andarchaea
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
• Some otherphyla areonly sparselysampled
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
• Some otherphyla areonly sparselysampled
• Solution:sequencemore phyla
TIGRTIGRTIGR
Solution II:Environmental Sequencing
shotgunshotgun
sequencesequence
Warner Brothers, Inc.Warner Brothers, Inc.
TIGRTIGRTIGR
Schizosaccharomyces pombeNeurospora crassaClostridium perfringensBacillus subtilisBacillus cereusBacillus anthracis UVDE2Bacillus haloduransBacillus anthracis UVDE1Deinococcus radioduransNostoc sp. PCC 7120
Duplication and Loss of UVDE
TIGRTIGRTIGR
0.1
Eisen, unpublished
TIGRTIGRTIGRBacteriophage_Aeh1_gi38640041-
B_halodurans_gi15615160-NC_002B_cereus_ATCC_14579_gi30018497B_cereus_ATCC_10987_gi42779349B_cereus_ZK_gi52144992-NC_0062B_anthracisSterne_gi49183267-NB_anthracis''_gi47525505-NC_00B_anthracisAmes_gi30260424-NC_B_thuringiensi_gi49481707-NC_0
B_cereus_ATCC_14579_gi30023378B_cereus_ATCC_10987_gi42784523B_cereus_ZK_gi52140215-NC_0062B_thuringiensi_gi49481769-NC_0B_anthracis''_gi47530914-NC_00B_anthracisAmes_gi30265371-NC_B_anthracisSterne_gi49188182-N
Clostridium_perfringens_SM101Clostridium_perfringens13_gi18Thermus_thermophilus_HB27_gi46
Schizosaccharomyces_pombe_gi19Cryptococcus_neoformans_var._n
Mimivirus_gi55819554-NC_006450
Desulfotalea_psychrophila_LSv5
Parachlamydia_sp._UWE25_gi4644Deinococcus_radiodurans_gi1580
Nostoc_sp._PCC_7120_gi17228758
Haloarcula_marismortui_ATCC_43
0.1
Eisen, unpublished
TIGRTIGRTIGRBacteriophage_Aeh1_gi38640041-
B_halodurans_gi15615160-NC_002B_cereus_ATCC_14579_gi30018497B_cereus_ATCC_10987_gi42779349B_cereus_ZK_gi52144992-NC_0062B_anthracisSterne_gi49183267-NB_anthracis''_gi47525505-NC_00B_anthracisAmes_gi30260424-NC_B_thuringiensi_gi49481707-NC_0
B_cereus_ATCC_14579_gi30023378B_cereus_ATCC_10987_gi42784523B_cereus_ZK_gi52140215-NC_0062B_thuringiensi_gi49481769-NC_0B_anthracis''_gi47530914-NC_00B_anthracisAmes_gi30265371-NC_B_anthracisSterne_gi49188182-N
Clostridium_perfringens_SM101Clostridium_perfringens13_gi18Thermus_thermophilus_HB27_gi46
Schizosaccharomyces_pombe_gi19Cryptococcus_neoformans_var._n
Mimivirus_gi55819554-NC_006450
Desulfotalea_psychrophila_LSv5
Parachlamydia_sp._UWE25_gi4644Deinococcus_radiodurans_gi1580
Nostoc_sp._PCC_7120_gi17228758
Haloarcula_marismortui_ATCC_43
1096677573537
1096666042551
1096675140239
1096666475773
1096675711585
10966940028791096670462713
1096666086897
1096678370949
1096694453501
10966972025391096666623641
1096666623637
1096682194685
1096666056189
1096667837471
1096683254507
1096666233173109669440470710966884770891096697093095
109668484463510966857435491096688747509
109666917769310966895557171096669703493
10966670769191096688266535
1096676326389
109668844436510966693263131096681630123
1096700674605
1096699641311
1096670737117
1096667696935
109667984132910966800905571096692646819
1096701720687
1096693795309
10966774312391096681626171
1096672265587
1096667256673
10966818411971096676309579
1096668382365
10966660283111096680111021
10966719940011096694880993
10966883603311096677727187
1096690529445
1096678371677109666679878510966694623111096690035519
1096688282135
1096681691081109670108481510966784510231096679173905
1096676104743
1096693838631
109667787794710966973284491096689281431
1096678656811
1096675301289
10966700579911096667819563
1096669060093
1096676658305
1096680458193
1096687910023
109667624774310966845439991096687531133
109669451479710966759976491096685288837
1096689760117
1096682128357
109669024191710966702304651096668431223
1096697857325
1096670040073
1096689880117
1096687463017109667750808710966899204811096698348215
1096678588549109666971052710966747261791096681094529
1096666482699
109670039321710966979925811096686623833
1096698493395109667708233910966838997471096675545349
1096686431167
1096672564399
1096694929319
1096667874479
1096677556395
1096669612909
1096668040291
10966786319091096675867685
1096673167543
1096670905075
1096670249109
1096694682753
1096697663865
1096688151719
109667561601710966955709291096698704347
109669836264910966786630131096677979077
1096670413969
10966897756531096669406473
10967004051171096690382431
1096698778935
1096701613299
1096665794533
1096680318827
1096700355905
1096667769481
1096672829105
1096676541175
10966768618111096679110051
1096679739013
1096666926787
1096675995949
1096667726833
1096694021391
109668959210310966778193771096697211351
1096672024349
1096689852609
1096681269049
1096681255001
1096666815719
1096690167029
109666695886110966824214031096666525675
10966661088451096694726421
1096689510889
10966699608131096697828127
1096681707653
10966943817651096695006741
10966828205871096683700427
1096668091749
10966829088331096669150489
1096682819153
10966893803271096698599555
1096690070567
1096680197943
10966954242371096689573849
1096669542873
10966796197871096694065455
1096697668105
10966941547331096681434805
10966685557931096677637275
10966928737171096677470235
1096697244001
10966879041971096690994043
1096687665199
1096676332821
1096695190845
1096697902519
1096667908813
10966823259071096669867351
10966854662291096667643249
10966773770051096700082005
1096688913469
1096682373345
1096689805705
1096679007257
1096675826411
1096666903249
1096693851319109669828223710966964016671096682563431
1096688211731
1096674045049
1096688619805
1096669505805
10966903521971096693811501
1096670737789
10966775993311096681759277
1096667150039
1096668435861
1096686050099
1096667806223
1096674995965
1096681188293
109668792137310966680063291096675611345
1096688753623
1096668119735
1096680774137
1096666923863
10966759874091096694019165
1096666183751
10966855735171096681550911
1096694733871
1096677198189
1096673194381
1096695096021
1096695122549
1096667991085
1096689636667
0.1
Eisen, unpublished
TIGRTIGRTIGR
Answers I:
Better Bioinformatics
TIGRTIGRTIGR
Non Homology Based FunctionalPrediction: Phylogenetic Profiles
TIGRTIGRTIGR
Non-Homology Prediction:Phylogenetic Profiles
• Step 1: Search all genes inorganisms of interest againstall other genomes
• Ask: Yes or No, is each genefound in each other species
• Cluster genes by distributionpatterns (profiles)
TIGRTIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutSHomologs
WhichSubfamilies?
MutLHomologs
BacteriaEscherichia coli K12 1 MutS1 1Haemophilus influenzae Rd KW20 1 MutS1 1Neisseria gonorrhoeae 1 MutS1 1Helicobacter pylori 26695 1 MutS2 -Mycoplasma genitalium G-37 - - -Mycoplasma pneumoniae M129 - - -Bacillus subtilis 169 2 MutS1,MutS2 1Streptococcus pyogenes 2 MutS1,MutS2 1Mycobacterium tuberculosis - - -Synechocystis sp. PCC6803 2 MutS1,MutS2 1Treponema pallidum Nichols 1 MutS1 1Borrelia burgdorferi B31 2 MutS1,MutS2 1Aquifex aeolicus 2 MutS1,MutS2 1Deinococcus radiodurans R1 2 MutS1,MutS2 1
ArchaeaArchaeoglobus fulgidus VC-16, DSM4304 - - -Methanococcus janasscii DSM 2661 - - -Methanobacterium thermoautotrophicum ΔH 1 MutS2 -
EukaryotesSaccharomyces cerevisiae 6 MSH1-6 3+Homo sapiens 5 MSH2-6 3+
MutL and MutS1 Always Together
TIGRTIGRTIGR
Co-Occurrence of Spo Genes
TIGRTIGRTIGR
Co Occurrence of UvrABC
TIGRTIGRTIGR
Co-Occurrence of Muk, SeqA, MutH
TIGRTIGRTIGR
Answers II:
Novelty is good and bad
TIGRTIGRTIGR
Deinococcus radiodurans
TIGRTIGRTIGR
DNA Repair Genes in D.radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution
RecFJNRQ, SbcCD, RecDRecARuvABC, RecG
Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2
White et al, 1999
TIGRTIGRTIGR
Problem:
List of DNA repair gene homologsin D. radiodurans genome is notsignificantly different from other
bacterial genomes of the similar size
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Most DNAmetabolismstudies in twophyla
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Most DNAmetabolismstudies in twophyla
• Deinococcusis distantfrom anywell studiedorganism
TIGRTIGRTIGR
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Most DNAmetabolismstudies in twophyla
• Deinococcusis distantfrom anywell studiedorganism
• Solution:more studies
TIGRTIGRTIGR
TIGRTIGR
Other peopleOther people
Mom and DadMom and Dad
H. H. OchmanOchman
W. MartinW. Martin
F. RobbF. Robb
J. BattistaJ. BattistaE.E. Orias Orias
D. BryantD. BryantS. OS. O’’NeillNeill
M.M. Eisen Eisen
N. MoranN. Moran
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P.P. Hanawalt Hanawalt
NSFNSF
J. HeidelbergJ. Heidelberg
T.ReadT.Read
N. WardN. Ward
M-I BenitoM-I Benito
J. C. VenterJ. C. Venter C. FraserC. Fraser
S. S. SalzbergSalzberg
O. WhiteO. White
I. PaulsenI. Paulsen
$$$$$$
ONRONRDOEDOE
NIHNIHH. H. TettelinTettelin
Recommended