Upload
bits
View
117
Download
3
Embed Size (px)
DESCRIPTION
This is the first presentation of the BITS training on 'Comparative genomics'. It reviews the basic concepts of sequence homology on different levels.Thanks to Klaas Vandepoele of the PSB department.
Citation preview
Comparative genomicsin eukaryotes
Klaas Vandepoele, PhD
Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium
http://www.bits.vib.be
Outline
Introduction
Gene family analysis
Genome analysis
ConTra: promoter alignment analysis
2
3
What is comparative genomics?
Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studies with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies.
http://genomics.ucdavis.edu/what.html
4
Principles
DNA sequences encoding and regulating the expression of essential proteins and RNAs will be conserved
Consequently, the regulatory profiles of genes involved in similar processes among related species will be conserved
Conversely, sequences that encode or control the expression of proteins or RNAs responsible for differences between species will be divergent
5
Definition
“ The combination of genomic data and comparative / evolutionary biology to address questions of genome structure, evolution and function”
Hardison, PLoS Biology 2003
6
What can we learn from cross-species comparisons?
Genome conservation transfer knowledge gained from model
organisms to non-model organisms
Genome variation understand how genomes change over time in
order to identify evolutionary processes and constraints
Detection of functional elements Coding elements (e.g. exons) Conserved non-coding sequences / elements
7
Conservation of gene structure
8
Homology & sequence similarity
Homology = shared ancestral common origin
Inferred based on: Sequence similarity Similar (multi-) protein domain
composition and organization So sequence similarity means homology?
No, it depends!
"Orthologs, paralogs, and evolutionary genomics“, Koonin 2005
9
Homology & sequence similarity
Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.
Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.
For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.
Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.
Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.
For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.
10
Multiple Sequence AlignmentsS
eque
nces
(~
taxa
)
Columns (~positions) in the alignment
11
Genome-wide sequence retrieval
Finding information from whole-genome sequencing projects DNA sequence reads Assembled genomic DNA sequences Annotated genes (RNA genes + protein-
encoding genes) Repeats, transposable elements Integrated platform providing both sequence
data and functional genomics dataInfo
rmat
ion
va
lue
low
high
12
Genome databases
Species-specific databases SGD TAIR Many others, e.g. wormbase, flybase,...
General & Integrative repositories EBI Genomes & Integr8 / Ensembl NCBI Entrez Genome UCSC
13
14