14
Comparative genomics in eukaryotes Klaas Vandepoele, PhD Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium [email protected] http://www.bits.vib.be

BITS - Introduction to comparative genomics

  • Upload
    bits

  • View
    117

  • Download
    3

Embed Size (px)

DESCRIPTION

This is the first presentation of the BITS training on 'Comparative genomics'. It reviews the basic concepts of sequence homology on different levels.Thanks to Klaas Vandepoele of the PSB department.

Citation preview

Page 1: BITS - Introduction to comparative genomics

Comparative genomicsin eukaryotes

Klaas Vandepoele, PhD

Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium

[email protected]

http://www.bits.vib.be

Page 2: BITS - Introduction to comparative genomics

Outline

Introduction

Gene family analysis

Genome analysis

ConTra: promoter alignment analysis

2

Page 3: BITS - Introduction to comparative genomics

3

What is comparative genomics?

Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studies with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies.

http://genomics.ucdavis.edu/what.html

Page 4: BITS - Introduction to comparative genomics

4

Principles

DNA sequences encoding and regulating the expression of essential proteins and RNAs will be conserved

Consequently, the regulatory profiles of genes involved in similar processes among related species will be conserved

Conversely, sequences that encode or control the expression of proteins or RNAs responsible for differences between species will be divergent

Page 5: BITS - Introduction to comparative genomics

5

Definition

“ The combination of genomic data and comparative / evolutionary biology to address questions of genome structure, evolution and function”

Hardison, PLoS Biology 2003

Page 6: BITS - Introduction to comparative genomics

6

What can we learn from cross-species comparisons?

Genome conservation transfer knowledge gained from model

organisms to non-model organisms

Genome variation understand how genomes change over time in

order to identify evolutionary processes and constraints

Detection of functional elements Coding elements (e.g. exons) Conserved non-coding sequences / elements

Page 7: BITS - Introduction to comparative genomics

7

Conservation of gene structure

Page 8: BITS - Introduction to comparative genomics

8

Homology & sequence similarity

Homology = shared ancestral common origin

Inferred based on: Sequence similarity Similar (multi-) protein domain

composition and organization So sequence similarity means homology?

No, it depends!

"Orthologs, paralogs, and evolutionary genomics“, Koonin 2005

Page 9: BITS - Introduction to comparative genomics

9

Homology & sequence similarity

Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.

Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.

For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.

Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.

Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.

For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.

Page 10: BITS - Introduction to comparative genomics

10

Multiple Sequence AlignmentsS

eque

nces

(~

taxa

)

Columns (~positions) in the alignment

Page 11: BITS - Introduction to comparative genomics

11

Genome-wide sequence retrieval

Finding information from whole-genome sequencing projects DNA sequence reads Assembled genomic DNA sequences Annotated genes (RNA genes + protein-

encoding genes) Repeats, transposable elements Integrated platform providing both sequence

data and functional genomics dataInfo

rmat

ion

va

lue

low

high

Page 12: BITS - Introduction to comparative genomics

12

Genome databases

Species-specific databases SGD TAIR Many others, e.g. wormbase, flybase,...

General & Integrative repositories EBI Genomes & Integr8 / Ensembl NCBI Entrez Genome UCSC

Page 13: BITS - Introduction to comparative genomics

13

Page 14: BITS - Introduction to comparative genomics

14