Upload
leighton-pritchard
View
579
Download
1
Embed Size (px)
DESCRIPTION
Slides from a Comparative Genomics and Visualisation course (part 2) presented at the University of Dundee, 11th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Citation preview
Compara've Genomics and Visualisa'on – Part 2
Leighton Pritchard
Part 2
l Part 1 l Experimental Compara5ve Genomics
l Bulk and Whole Genome Comparisons
l Genome Features
l Who let the –logues out?
l Finishing The Hat
Genome Features l Genes:
l transla5on start
l introns
l exons
l transla5on stop
l transla5on terminator
l ncRNA: l tRNA – transfer RNA
l rRNA – ribosomal RNA
l CRISPRs – bacterial and archaeal defence (genome edi5ng)
l many other classes (including enhancers)
Genome Features l Regulatory sites
l Transcrip5on start site (TSS)
l RNA polymerase binding sites
l Transcrip5on Factor Binding Sites (TFBS)
l Core, proximal and distal promoter regions
l Repe''ve Regions and Mobile Elements
l Tandem repeats
l (retro-‐)transposable elements
� Alu has ≈50,000 ac5ve copies in human genome
l Phage inclusion (bacteria/archaea)
Pennacchio & Rubin (2001) Nat. Rev. Genet. doi:10.1038/35052548
human v mouse comparison
Genome Feature Iden'fica'on l Gene Finding:
1. Empirical (evidence-‐based) methods:
� Inference from known protein/cDNA/mRNA/EST sequence
� Inference from mapped RNA reads
2. Ab ini*o methods:
� Iden5fica5on of sequences associated with gene features:
ª TSS, CpG islands, Shine-‐Dalgarno sequence, stop codons, etc.
3. Inference from genome comparisons/conserva5on
Liang et al. (2009) Genome Res. doi:10.1101/gr.088997.108 Brent (2007) Nat. Biotech. doi:10.1038/nbt0807-‐883 Korf (2004) BMC Bioinf. doi:10.1186/1471-‐2105-‐5-‐59
Genome Feature Iden'fica'on l Finding Regulatory Elements (short, degenerate):
1. Empirical (evidence-‐based) methods:
� Inference from protein-‐DNA binding experiments
� Inference from coexpression
2. Ab ini*o methods:
� Iden5fica5on of regulatory mo5fs (profile/other methods):
ª TATA, sigma-‐factor binding sites, etc.
� sta5s5cal overrepresenta5on
� Iden5fica5on from sequence proper5es
3. Inference from sequence conserva5on/genome comparisons
Zhang et al. (2011) BMC Bioinf. doi:10.1186/1471-‐2105-‐12-‐238 Kilic et al. (2013) Nucl. Acids Res. doi:10.1093/nar/gkt1123 Vavouri & Elgar (2005) Curr. Op. Genet. Devel. doi:10.1016/j.gde.2005.05.002
Genome Feature Iden'fica'on l All predic5on methods result in errors
l All experiments have error
l Genome comparisons can help correct errors
l [OPTIONAL ACTIVITY] – useful for exercise l predict_CDS.md Markdown
l Other op5ons for prokaryo5c genecalling: l Glimmer (hZp://ccb.jhu.edu/so\ware/glimmer/index.shtml)
l GeneMarkS (hZp://opal.biology.gatech.edu/)
l RAST (hZp://rast.nmpdr.org/)
l BASys (hZps://www.basys.ca/), etc.
l Op5ons for eukaryo5c genecalling: l GlimmerHMM (hZp://ccb.jhu.edu/so\ware/glimmerhmm/)
l GeneMarkES (hZp://opal.biology.gatech.edu/gmseuk.html)
l Augustus (hZp://augustus.gobics.de/), etc.
Who Let The -‐logues Out?
Evolu'onary rela'onships of genome features can be complex. We require precise terms to describe rela'onships between genome features.
Comparing Gene Features l Given gene annota5ons for more than one genome, how can we organise and understand rela5onships?
l Func5onal similarity (analogy)
l Evolu5onary common origin (homology, orthology, etc.)
l Evolu5onary/func5onal/family rela5onships (paralogy)
Terms first suggested by Fitch (1970) Syst. Zool. doi:10.2307/2412448
Agack of the –logues l Technical terms describing evolu5onary rela5onships
l Homologues: elements that are similar because they share a common ancestor (NOTE: There are NOT degrees of homology!)
l Analogues: elements that are (func5onally?) similar, possibly through convergent evolu5on and not by sharing common ancestry
l Orthologues: homologues that diverged through specia5on
l Paralogues: homologues that diverged through duplica5on within the same genome
l (also co-‐orthologues, xenologues, etc.)
Agack of the –logues
'me
ancestral genome feature genome
Agack of the –logues
'me
specia'on
ancestor: iA
species1:iA species2:iA
orthologues
• Orthologues: homologues that diverged through specia5on
genome
Agack of the –logues
ancestral copy:\A
'me
copy 1:\A copy 2:\A’
duplica'on
paralogues
Paralogues: homologues that diverged through duplica5on within the same genome
genome
Agack of the –logues
'me
specia'on
ancestor:iA
species1:iA species2:iA
species1:iA’ species1:iA species2:iA
duplica'on
orthologues
out-‐paralogues
in-‐paralogues
genome
Agack of the –logues
'me
specia'on
ancestor:iA
species1:iA Species2:iA
species1:iA’ species2:iA species2:iA’ species1:iA
duplica'on
in-‐paralogues in-‐paralogues
out-‐paralogues
orthologues
genome
Agack of the –logues l BUT: biology is not well-‐behaved: rela5onships can be difficult to infer
l Gene loss occurs
l Homologues can diverge – some5mes very widely: hard to recognise
l Reconstructed evolu5onary trees for specia5on events may not be robust
Kristensen et al. (2011) Brief. Bioinf. doi:10.1093/bib/bbr030
genome
extensive divergence
Agack of the –logues
'me specia'on
ancestor:iA
species1:iA Species2:iA
species1:iA’ species2:iA species2:iA’ species1:iA duplica'on
species1:iA? species1:iA species2:iA?
in-‐paralogues (co-‐)orthologues?
contemporary sequence
historical events
out-‐paralogues/co-‐orthologues?
Current classifica'ons of orthology/paralogy are inferences
Agack of the –logues l BUT: biology is not well-‐behaved: rela5onships can be difficult to infer
l Gene loss occurs
l Homologues can diverge – some5mes very widely: hard to recognise
l Reconstructed evolu5onary trees for specia5on events may not be robust
l Some resources and tools ‘bend’ defini5ons, e.g. Ensembl Compara and OrthoMCL.
hZp://www.ensembl.org/info/genome/compara/ homology_method.html Kristensen et al. (2011) Brief. Bioinf. doi:10.1093/bib/bbr030
Note on “Orthology” l Frequently abused/misused as a term
l “Orthology” is an evolu5onary rela5onship, o\en bent into service as a func5onal descriptor
l Strictly defined only for two species or clades! l (cf. OrthoMCL, etc.)
l Orthology is not transi5ve (A is orthologue of C and B is orthologue of C does not imply A is an orthologue of B)
l (cf. EnsemblCompara defini5ons)
Storm & Sonnhammer (2002) Bioinforma@cs. doi:10.1093/bioinforma'cs/18.1.92
Ensembl Compara defini'ons l within_species_paralog: same-‐species paralogue (in-‐paralogue)
l ortholog_one2one: orthologue
l ortholog_one2many: orthologue/paralogue rela5onship
l orthology_many2many: orthologue/paralogue rela5onship
Vilella et al. (2009) Genome Res. doi:10.1101/gr.073585.107
NOTE: the taxonomy may not always be correct…
“The Ortholog Conjecture”
Without duplica'on, a gene is unlikely to change its basic func'on, because this would lead to loss of the original func'on, and this would be harmful.
Problems with the Ortholog Conjecture l Nehrt et al. (2011) say:
l Paralogues beZer predictor of func5on than orthologues
� ∴ conjecture is false!
l Cellular context beZer for protein func5on inference
l Func5on defined from Gene Ontology (GO)
Nehrt et al. (2011) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1002073 Chen et al. (2012) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1002784
Problems with the Ortholog Conjecture
l But do we understand func5on well enough to test the conjecture?
l Chen et al. (2012) say: “No” l “examina5on of func5onal studies of homologs with iden5cal
protein sequences reveals experimental biases, annota5on errors, and homology-‐based func5onal inferences that are labeled in GO as experimental. These problems […] make the current GO inappropriate for tes5ng the ortholog conjecture”
l Expression level similarity is more similar for orthologues than paralogues (but is this “func'on”…?)
Nehrt et al. (2011) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1002073 Chen et al. (2012) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1002784
Finding “Orthologues”
The process of finding evolu'onary (and/or func'onal) equivalents of genes across two or more organisms’ genomes.
Why are “orthologues” so important? l Orthology formalises the concept of corresponding genes across mul5ple organisms.
l Evolu5onary
l Func5onal? (“The Ortholog Conjecture”)
l Applica5ons in: l Compara5ve genomics
l Func5onal genomics
l Phylogene5cs, …
l Many (>35) databases aZempt to describe orthologous rela5onships
l hZp://queskororthologs.org/orthology_databases
Dessimoz (2011) Brief. Bioinf. doi:10.1093/bib/bbr057
How to find orthologues? l Many published methods and databases:
l Pairwise between two genomes:
� RBBH (aka BBH, RBH, etc.), RSD, InParanoid, RoundUp
l Mul5-‐genome
� Graph-‐based: COG, eggNOG, OrthoDB, OrthoMCL, OMA, Mul5Paranoid
� Tree-‐based: TreeFam, Ensembl Compara, PhylomeDB, LOFT
l Methods may apply different -‐ or refined -‐ defini5ons of orthology, paralogy, etc.
Salichos et al. (2011) PLoS One. doi:10.1371/journal.pone.0018755 Trachana et al. (2011) Bioessays doi:10.1002/bies.201100062 Kristensen et al. (2011) Brief. Bioinf. doi:10.1093/bib/bbr030
Pairwise approaches l S1, S2 are the gene sequence sets from two organisms
l Compare S1 to S2, and iden5fy the most similar pairs of sequences: these are “orthologues” (or “puta5ve orthologues”).
l Many similarity measures possible (which threshold: E-‐value, bit score, coverage…?):
l Reciprocal best BLAST hit (RBBH) – used by e.g. InParanoid
l Reciprocal smallest difference (RSD) – used by e.g. RoundUp
l and so on…
l Can be extended to mul5-‐organism clusters by graph-‐based approaches
Östlund et al. (2009) Nuc. Acids Res. doi:10.1093/nar/gkp931 DeLuca et al. (2012) Bioinf. doi:10.1093/bioinforma'cs/bts006
Reciprocal Best BLAST Hits l S1, S2 are the gene sequence sets from two organisms
l BLASTP: l Query=S1, Subject=S2
l Query=S2, Subject=S1
l Op5onally filter BLAST hits (e.g. on %iden5ty and %coverage)
l Find all pairs of sequences {GS1n, GS2n} in S1, S2 where GS1n is the best BLAST match to GS2n and GS2n is the best BLAST match to GS1n.
best hit
best hit best hit
best hit
2nd best hit
2nd best hit
✔ ✘
best hit
Reciprocal Best BLAST Hits l Advantages:
l quick
l easy
l performs surprisingly well (see later…)
l Disadvantages: l misses paralogues
l not good at iden5fying gene families or *-‐to-‐many rela5onships without more detailed analysis.
l no strong theore5cal/phylogene5c basis.
COG l COG (Clusters of Orthologous Groups; now POG, KOG, eggNOG etc.)
l Graph extension of RBBH to clusters of mutual RBBH
l “Any group of at least three proteins from different genomes, more similar to each other than any other proteins from those genomes, are an orthologous family.”
l Conduct RBBH
l Collapse paralogues
l Detect “triangles”
l Merge triangles having common side
l Manual cura5on
l Databases have many outparalogues
Tatusov et al. (2000) Nucl. Acids Res. doi:10.1093/nar/28.1.33
MCL l MCL constructs a network from all-‐vs-‐all BLAST results
l Then applies matrix opera5ons: expansion and infla5on
l Itera5ve expansion and infla*on un5l network convergence
Enright et al. (2002) Nucl. Acids Res. doi:10.1093/nar/30.7.1575
MCL Expansion Infla'on
…
…
… …
→
→
Input
Clustering
OrthoMCL l hZp://orthomcl.org/orthomcl/
1. Defines poten5al inparalogue, orthologue and co-‐orthologue pairs (using RBBH! – see algorithm descrip5on in papers directory)
2. Applies MCL to cluster inparalogue, orthologue, co-‐orthologue pairs/
l Output clusters include both orthologues and paralogues
Li et al. (2003) Genome Res. doi:10.1101/gr.1224503
Notes of Cau'on l BLAST-‐based orthology methods (e.g. RBBH, InParanoid, COG) are fast!
l But they have some drawbacks:
l No guarantee that sequence matches are transi5ve (A may match B at a domain differently than B matches C)
l No evolu5onary distance model
l Mul5ple domain matches are not accounted for
l These methods find similar sequences, then make assump5ons based on similarity and number of matches. They do not detect orthologues directly!
l Tree-‐based methods incorporate:
l Evolu5onary distance
l Direct orthologue detec5on
Finding “Orthologues” l Pairwise analysis: RBBH
l [ACTIVITY] l find_rbbh.ipynb iPython notebook
l Mul5-‐organism analysis: MCL
l [ACTIVITY] l mcl_orthologues/README.md Markdown
l mcl_orthologues.ipynb iPython notebook
Other Methods l Synteny-‐based:
l Homologene (NCBI): � hZp://www.ncbi.nlm.nih.gov/homologene
l Manual cura5on:
l Mouse Genome Database (MGD):
� hZp://www.informa5cs.jax.org/homology.shtml
l Tree-‐based: l EnsemblCompara (EMBL-‐EBI):
� hZp://www.ensembl.org/info/genome/compara/index.html
l TreeFam (EMBL-‐EBI): � hZp://www.treefam.org/
l OrthologID: � hZp://nypg.bio.nyu.edu/orthologid/
Evalua'ng Orthologue Predic'ons
Which method works best? (and what do we mean by “best” anyway?)
Evalua'ng Predic'ons l Works the same way for all predic5on tools
1. Define a “valida5on set” (gold standard), unseen by the predic5on tool
2. Make predic5ons with the tool
3. Evaluate confusion matrix and performance sta5s5cs
l Sensi5vity
l Specificity
l Accuracy
Standard: +ve -‐ve
Predict +ve TP FP
Predict -‐ve FN TN
False posi5ve rate FP/(FP+TN)
False nega5ve rate FN/(TP+FN)
Sensi5vity TP/(TP+FN)
Specificity TN/(FP+TN)
False discovery rate (FDR) FP/(FP+TP)
Accuracy (TP+TN)/(TP+TN+FP+FN)
Evalua'ng Orthologue Predic'ons l Take advantage of prokaryo5c operon structure: conserved syntenic triplets likely to be orthologous
l Idea: If the outer pair in a syntenic triplet are orthologous, the middle gene is likely to be, too.
l Middle genes are orthologue “gold standard”
l Do RBBH reliably iden5fy middle genes from syntenic triplets?
Wolf et al. (2012) Genome Biol. Evol. doi:10.1093/gbe/evs100
Evalua'ng Orthologue Predic'ons l Two well-‐characterised genomes
compared against 573 prokaryotes
l Iden5fied RBBH (with permissive BLAST sewngs)
l “Overwhelming majority” of middle genes (counterparts) are BBH
l 88-‐99% of BBH are in syntenic triplets
l Therefore, RBBH reliably finds orthologues
Wolf et al. (2012) Genome Biol. Evol. doi:10.1093/gbe/evs100
Evalua'ng Orthologue Predic'ons l Four orthologue predic5on algorithms:
l RBBH (and cRBH)
l RSD (and cRSD)
l Mul5Paranoid
l OrthoMCL
l Tested against 2,723 curated orthologues from six Saccharomycetes
l Rated by: l Sensi5vity: TP/(TP+FN) – what propor5on of orthologues are found
l Specificity: TN/(TN+FP) – how well are non-‐orthologues excluded
l Accuracy: (TP+TN)/(TP+TN+FP+FN) – general measure of performance
l FDR: FP/(FP+TP) – what propor5on of predic5ons are incorrect
Salichos et al. (2011) PLoS One. doi:10.1371/journal.pone.0018755
Evalua'ng Orthologue Predic'ons l Four orthologue predic5on algorithms:
l RBBH (cRBH)
l RSD (cRSD)
l Mul5Paranoid
l OrthoMCL
l cRBH most accurate, and specific, with lowest FDR
Salichos et al. (2011) PLoS One. doi:10.1371/journal.pone.0018755
Evalua'ng Orthologue Predic'ons l Tests of several methods on a number of literature-‐based benchmarks for:
l Correct branching of phylogeny
l Grouping by func5on
� GO similarity
� EC number
� Expression level
� Gene Neighbourhood
Altenhoff & Dessimoz (2009) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1000262
Evalua'ng Orthologue Predic'ons
Altenhoff & Dessimoz (2009) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1000262
Evalua'ng Orthologue Predic'ons l 70 gene family test, mul5ple evolu5onary scenarios
l Tested databases with associated algorithms:
Trachana et al. (2011) Bioessays. doi:10.1002/bies.201100062
Evalua'ng Orthologue Predic'ons l 70 gene family test set, mul5ple evolu5onary scenarios
l All methods/dbs have strong scope for improvement.
l OrthoMCL poor performer, TreeFam & eggNOG do best
Trachana et al. (2011) Bioessays. doi:10.1002/bies.201100062
Orthologue Predic'on Performance l Performance varies by choice of method and interpreta'on of “orthology”
l Biggest influence is genome annota'on quality
l Rela've performance varies with benchmark choice
l (clustering) RBBH outperforms more complex algorithms under many circumstances
Selec'on Pressures
Signs of selec'on pressure iden'fiable by compara've genomics
Selec'on Pressures l Defining core groups of genes by “orthology” allows analysis of those groups:
l Synteny/colloca'on
l Gene neighbourhood changes (e.g. genome expansion)
l The pangenome: core and accessory genomes
l and sequences in those groups: l Mul5ple alignment
l Domain detec5on
l Iden5fica5on of func5onal sites
l Inference of evolu'onary pressures
Synteny l Selec5ve pressures depend on gene (product) func5on
l Genes involving physically or func5onally-‐interac5ng proteins tend to evolve under similar selec5ve constraints
l Par5cularly in bacteria, this leads to co-‐expression as regulons and colloca5on in operons
l Colloca5on (and coregula5on) may be iden5fied by compara5ve genomics
l (This is also true when considering regulatory or metabolic networks, similarly to genome organisa5on)
Alvarez-‐Ponce et al. (2011) Genome Biol. Evol. doi:10.1093/gbe/evq084
Synteny l Many tools/packages/services for synteny detec5on,
e.g.
l SyMAP � hZp://www.agcol.arizona.edu/so\ware/
symap/ l i-‐ADHoRe
� hZp://bioinforma5cs.psb.ugent.be/so\ware/details/i-‐-‐ADHoRe
l MCScan, Cyntenator, etc
Soderlund et al. (2011) Nucl. Acids. Res. doi:10.1093/nar/gkr123 Proost et al. (2011) Nucl. Acids Res. doi:10.1093/nar/gkr955
i-‐ADHoRe l Algorithm:
1. Combine tandem repeats of genes/gene sets
2. Make gene homology matrix (GHM): iden5fy collinear regions (diagonals) for first genome pair
3. Convert these to profiles
4. Use GG2 algorithm to align profiles
5. Search next genome with profiles, spliwng them where necessary
6. iterate un5l complete
l Gives genome-‐scale mul5ple alignments of blocks of genes
Proost et al. (2011) Nucl. Acids Res. doi:10.1093/nar/gkr955
i-‐ADHoRe l [ACTIVITY]
l i-ADHoRe/README.md Markdown
l i-ADHoRe.ipynb iPython notebook
Genome Expansion l Mobile/repeat elements reproduce and expand during evolu5on
l Generates sequence “laboratory” for varia5on and experiment
l e.g. Phytophthora infestans effector protein expansion and arms race
Haas et al. (2009) Nature. doi:10.1038/nature08358
Genome Expansion l Mobile elements (MEs) are large,
carry genes with them.
l Regions rich in MEs have larger gaps between consecu5ve genes
l Effector proteins are found preferen5ally in regions with large gaps, also show increased rates of evolu5onary divergence.
l “Two-‐speed genome” associated with adaptability to new hosts/escape from evolu5onary “boZleneck”
Haas et al. (2009) Nature. doi:10.1038/nature08358
The Pangenome l The gene complement of a set of organisms (e.g. species group) is the
pangenome, defined by the union of two gene sets:
l Core genes: genes present in all examples (define common species characteris5cs)
l Accessory genes: genes only present in a subset of examples (relevant to adapta5on of individuals)
l Defini5on depends on composi5on of organism set
l Core genome hypothesis:
l “The core genome is the primary cohesive unit defining a bacterial species.”
l Online tools available, e.g. l Panseq (hZp://lfz.corefacility.ca/panseq/)
Laing et al. (2010) BMC Bioinf. doi:10.1186/1471-‐2105-‐11-‐461 Lefébure et al. (2010) Genome Biol. Evol. doi:10.1093/gbe/evq048
Defining a species’ core genome l “Orthologue groups” with a
representa5ve in (nearly) every member of the set
l But we only have a sample of the species, not every member…
l …so use rarefac5on curves to es5mate core genome size.
1. Randomly order organisms, and count number of ‘core’ and ‘new’ genes seen with each new genome addi5on.
2. Repeat un5l you have a reasonable es5mate of error/no new genes found
Lefébure et al. (2010) Genome Biol. Evol. doi:10.1093/gbe/evq048
Direc'onal Selec'on l Several sta5s5cal tests for direc5onal selec5on, e.g.
l QTL sign
l Ka/Ks (dN/dS) ra'o test – most commonly applied
l Rela5ve rate test
l Ka/Ks ra'o: l Ka (or dN): number of non-‐synonymous subs5tu5ons per non-‐
synonymous site
l Ks (or dS): number of synonymous subs5tu5ons per synonymous site
l Ka/Ks > 1 ⇒ posi5ve selec5on; Ka/Ks < 1 ⇒ stabilising selec5on
l Several methods/tools for calcula5on
� PAML (hZp://abacus.gene.ucl.ac.uk/so\ware/paml.html)
� SeqinR (hZp://cran.r-‐project.org/web/packages/seqinr/index.html)
Genome-‐Wide Posi've Selec'on
Lefébure & Stanhope (2009) Genome Res. doi:10.1101/gr.089250.108
An Analysis Output l Class comparison: animal-‐pathogenic (APE) vs plant-‐associated bacteria (PAB)
l Presence of horizontally-‐acquired islands (HAI)
l Genes with greater similarity to PAB than APE
Toth et al. (2006) Annu. Rev. Phytopath. doi:10.1146/annurev.phyto.44.070505.143444
Things I Didn’t Get To l Genome-‐Wide Associa'on Studies (GWAS):
l Try hZp://genenetwork.org/ to play with some data
l Predic'on of regulatory elements, e.g.
l Kellis et al. (2003) Nature doi:10.1038/nature01644
l King et al. (2007) Genome Res. doi:10.1101/gr.5592107
l Chaivorapol et al. (2008) BMC Bioinf. doi:10.1186/1471-‐2105-‐9-‐455
l CompMOBY: hZp://genome.ucsf.edu/compmoby/
l Detec'on of Horizontal/Lateral Gene Transfer (HGT/LGT), e.g. l Tsirigos & Rigoutsos (2005) Nucl. Acids. Res. doi:10.1093/nar/gki187
l Phylogenomics, e.g.
l Delsuc et al. (2005) Nat. Rev. Genet. doi:10.1038/nrg1603
Finishing The Hat
Some of the things I hope you have taken away from the lectures/ac'vi'es
Take-‐Home Messages l Compara've genomics is a powerful set of techniques for:
l Understanding and iden5fying evolu5onary processes and mechanisms
l Reconstruc5ng detailed evolu5onary history of a set of organisms
l Iden5fying and understanding common genomic features of organisms
l Providing hypotheses about gene func5on for experimental inves5ga5on
l A huge amount of data is available to work with
l And it’s only going to get much, much larger
l Results feed into many areas of study:
l Medicine and health
l Agriculture and food security
l Basic biology in all fields
l Systems and synthe5c biology
Take-‐Home Messages l Compara've genomics is essen'ally based around comparisons
l What is similar between two genomes? What is different?
l Compara've genomics is evolu'onary genomics
l Large datasets benefit from visualisa'on for effec've interpreta'on
l Much scope for improvement in visualisa5on
l Tools with the same purpose give different output
l BLAST vs MUMmer
l RBBH vs MCL
l Choice of applica'on magers for correctness and interpreta'on! – understand what the applica'on does, and its limits.
Take-‐Home Messages
l Compara've genomics is l Fun l Indoor work, in the warm and dry l Not a job that involves heavy liiing
Credits l This slideshow is shared under a Crea5ve Commons AZribu5on 4.0 License hZp://crea5vecommons.org/licenses/by/4.0/)
l Copyright is held by The James HuZon Ins5tute hZp://www.huZon.ac.uk
l You may freely use this material in research, papers, and talks so long as acknowledgement is made.