Upload
frances-smithey
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
High-throughut comparative genomics
24th October 2013Joe Parker,
Queen Mary University London
Topics
1. Introduction2. Background: why phylogenomics?3. Examples4. Practice5. Case study6. On the horizon7. Over the horizon
Aims
• Context of phylogenomics: Next-generation sequencing (NGS)
• Why phylogenomics?• Practical analyses• Future developments
1. Our Research
Lab Interests
• Ecology and evolution of traits• Echolocation, sociality• NGS data for population genetics and phylogenomics
Activities
• Phylogeny estimation/comparison• Molecular correlates of evolution;
– site substitutions, dN/dS, composition• Simulation • Dataset limitations
(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
2. Background
Next-generation sequencing
Why phylogenomics, not -genetics?
• Causes of discordant signal– Incomplete lineage sorting– Lateral transfer– Recombination – Introgression
Quantitative biology
• Multiple configurations
• Hyperparameters empirically investigated
• Determine sensitivity of results
Distributions
• Genome-scale data provides context
• Identify outliersGenes / taxa / trees
• Compare values across biological systems
Integration with ‘Omics
• Multiple databases
• Functional data
• Bibliographic information
3. Example studies
Tsakgogeorgia et al. (in press)
QuickTime™ and a decompressor
are needed to see this picture.
Salichos & Rokas (2013)
QuickTime™ and a decompressor
are needed to see this picture.
Backström et al. (2013)
QuickTime™ and a decompressor
are needed to see this picture.
Lindblad-Toh et al. (2011)
QuickTime™ and a decompressor
are needed to see this picture.
4. Practice
Source material
• Samples• Storage• Purification• Library prep
Sequencing
• Genome– Sanger– Illumina – Pyro /454– SOLiD– PacBio
• Transcriptome / RNA-seq– MyBAITS
• HiSeq / MiSeq• IonTorrent
Infrastructure
• Desktop machines• Computing clusters• Grid systems• Cloud-based computation
Assembly, Annotation
• Assembly– To reference (mapping)– De novo
• Annotation– By homology– De novo
•SOAPdenovo•MAKER•Velvet•Bowtie / Cufflinks / Tophat•Trinity
Alignment
• PRANK• MUSCLE• MAFFT• Clustal
Phylogeny inference
• MrBayes• RAxML• BEAST• MP-EST• STAR
Phylogenetic analysis
• BEAST• HYPHY• PAML• Pipelines• LRT
5. Case study
QuickTime™ and a decompressor
are needed to see this picture.
Parker et al. (2013)
• De novo genomes:– four taxa– 2,321 protein-coding loci– 801,301 codons
• Published:– 18 genomes
• ~69,000 simulated datasets
• ~3,500 cluster cores
Our pipeline for detecting genome-wide convergence
mean = 0.05
mean = 0.05 mean = -0.01 mean = -0.08
Development cycle
Design
Wireframe & specify
tests
Implement
AlignmentloadSequences()
getSubstitutions()
PhylogenytrimTaxa()getMRCA()
DataSeriescalculateECDF()randomise()
RegressiongetResiduals()
predictInterval()
Review, refine & refactor
Parker et al. (2013)
Parker et al. (2013)
6. On the horizon
Environmental metagenomics
Models of computation
• Cloud resources: Unlimited flexibility, finite time
• Development trade-off– Off-the-shelf– Bespoke
• Exploratory work– Real time genomic
transects?
• Essential fundamental data missing from nearly every system;
– Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
Serialisation
• Process data remotely
• Freeze-dry objects, download to desktop
• Implement new methods directly on previously-analysed data
7. Over the horizon
• Real-time phylogenetics• Field phylogenetics• Alignment-free analyses
Conclusions
• Why phylogenomics?• Practice• Comparative approach• Statistical context
ThanksSteve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1
1School of Biological and Chemical Sciences, Queen Mary, University of London2Wellcome Trust Sanger Institute
3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan
Chris Walker & Dan TraynorQueen Mary GridPP High-throughput Cluster
Chaz Mein & Anna TerryBarts and The London Genome Centre
Mahesh PancholiSchool of Biological and Chemical Sciences
BBSRC (UK); Queen Mary, University of London
Resources• My email: Joe Parker (Queen Mary University of London): [email protected]
• Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511.
• Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press.
• Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327-331. doi:10.1038/nature12130
• Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033
• Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530
• Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009
• The Tree Of Life: http://phylogenomics.blogspot.co.uk/
• RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html
• Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/
• OpenHelix: http://blog.openhelix.eu/
• Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)