48
Comparative Genomics Comparative Genomics

Comparative Genomics

  • Upload
    stuart

  • View
    65

  • Download
    1

Embed Size (px)

DESCRIPTION

Comparative Genomics. Overview. Comparing Genomes Homologies and Families Sequence Alignments. Comparative Genomics. Allows us to achieve a greater understanding of vertebrate evolution Tells us what is common and what is unique between different species at the genome level - PowerPoint PPT Presentation

Citation preview

Page 1: Comparative Genomics

Comparative GenomicsComparative Genomics

Page 2: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

OverviewOverview

Comparing Genomes Homologies and Families Sequence Alignments

Page 3: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Comparative GenomicsComparative Genomics

Allows us to achieve a greater understanding of vertebrate evolution

Tells us what is common and what is unique between different species at the genome level

The function of human genes and other regions may be revealed by studying their counterparts in lower organisms

Helps identify both coding and non-coding genes and regulatory elements

Page 4: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Sequence Conservation Over Sequence Conservation Over TimeTime

Page 5: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Large stretches of non-coding regions in vertebrates

Regulatory regions of:Developmental genesTranscription factorsmiRNA

Non Coding RegionsNon Coding Regions

Kikuta et al., Genome Research, May 2007

Page 6: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Methods of Alignment- EnsemblMethods of Alignment- Ensembl

BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human – mouse

Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human – zebrafish

PECAN global alignment used for multispecies alignments

Page 7: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

We can better understand evolution/ speciation

We can find important, functional regions of the sequence (codons, promoters, regulatory regions)

It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).

Quality control!

Why Compare Genomes?Why Compare Genomes?

Page 8: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Evolution at the DNA LevelEvolution at the DNA Level

…ACTGACATGTACCA…

…AC----CATGCACCA…

Mutation

Sequence edits

Rearrangements

Deletion

InversionTranslocationDuplication

Page 9: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

• Mammals have roughly 3 billion base pairs in their genomes

• Over 98% human genes are shared with primates, with more than 95-98% similarity between genes.

• Even the fruit fly shares 60% of its genes with humans! (March 2000)

• Compare human & Mouse

• 40% of human genome align with mouse• 24% of human genome missing in mouse (also mouse-specific sequences)

Comparing GenomesComparing Genomes

Page 10: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Improving Gene QualityImproving Gene Quality

Comparative genomics predicts one long

transcript.

Page 11: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Pseudogene recoveryPseudogene recovery

chr 3 chr X

humanmouseratdogcow

We find 67 confident cases where a human protein is closer to the ancestor than any extant species in the alignment

Page 12: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

• Uses all the species

• Prediction pipeline: Begins with BLAST and sequence clustering

• Compares gene relationships to species relationships

How Does Ensembl Predict How Does Ensembl Predict Homology?Homology?

Page 13: Comparative Genomics

BSR: Blast Score Ratio. When 2 proteins P1 and P2 are compared, BSR=scoreP1P2/max(self-scoreP1 or self-scoreP2). The default threshold used in the initial clustering step is 0.33.

Page 14: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Orthologue / Paralogue Prediction Orthologue / Paralogue Prediction AlgorithmAlgorithm

(1) Load the longest translation of each gene from all species used in Ensembl.

(2) Run WUBLASTp+SW of every gene against every other (both self and non-self species) in a genome-wide manner.

(3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values.

(4) Extract the connected components (=single linkage clusters), each cluster representing a gene family.

(5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE.

(6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage.

(7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree (TreeBeSt).

(8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

Page 15: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Anopheles gambiae

Aedes aegypti

Drosophila melanogaster

Dasypus novemcinctus

Loxodonta africana

Echinops telfairi

Tupaia belangeri

Homo sapiens

Pan troglodytes

Macaca mulatta

Otolemur garnettii

Mus musculus

Rattus norvegicus

Spermophilus tridecemlineatusCavia porcellus

Oryctolagus cuniculus

Erinaceus europaeus

Myotis lucifugus

Canis familiaris

Felis catus

Bos taurusMonodelphis domestica

Ornithorhynchus anatinus

Gallus gallus

Xenopus tropicalis

Gasterosteus aculeatusOryzias latipes

Takifugu rubripes

Tetraodon nigroviridis

Danio rerio

Ciona intestinalis

Ciona savignyi

Caenorhabditis elegans

Saccharomyces cerevisiae

Species TreeSpecies Tree

Page 16: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Phylogenetic Tree Reconciliation: the Species/Gene Tree ProblemDufayard et al. ERCIM News No. 43 October 2000

Species and Gene TreesSpecies and Gene Trees

Page 17: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Genes/Species Tree reconciliation: TreeBeSTGenes/Species Tree reconciliation: TreeBeST

Page 18: Comparative Genomics

ReconciliationReconciliation

M

R

H

M

R

H

species tree

unrooted gene tree

Duplication nodeSpeciation node

M

R

HM

H

R

gene

loss

gene

loss

gene lossR’

H’

M’

Page 19: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Viewing Trees in EnsemblViewing Trees in Ensembl

GeneView page

GeneTreeView

Page 20: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Types of HomologuesTypes of Homologues

Orthologs : any gene pairwise relation where the ancestor node is a speciation event

Paralogs : any gene pairwise relation where the ancestor node is a duplication event

Page 21: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Orthologue and Paralogue TypesOrthologue and Paralogue Types

ortholog_one2one ortholog_one2many ortholog_many2many apparent_ortholog_one2one

within_species_paralog between_species_paralog

Page 22: Comparative Genomics

Ortholog and Paralog typesOrtholog and Paralog types

Page 23: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Ortholog and Paralog typesOrtholog and Paralog types

Page 24: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

What is ‘1 to 1’?

What is ‘1 to many’?

Orthologues on GeneViewOrthologues on GeneView

Page 25: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Protein FamiliesProtein Families

How: Cluster proteins for every isoform (transcript) in every species.

Why: Predict a function for ‘novel’ genes/proteins

Understand gene relationships

Page 26: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Protein DatasetProtein Dataset

More than 1,800,000 proteins clustered:

All Ensembl protein predictions from all species supported 895,070 protein predictions

All metazoan (animal) proteins in UniProt: 96,030 UniProtKB/Swiss-Prot 892,0208 UniProtKB/TrEMBL

Page 27: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Clustering StrategyClustering Strategy

BLASTP all-versus-all comparison Markov clustering For each cluster:

Calculation of multiple sequence alignments with ClustalW

Assignment of a consensus description

Page 28: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Link to FamilyView

Where are Families Where are Families shown? shown? ProtViewProtView

Page 29: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Ensembl family members

within human

Ensembl family

members in other species

JalView multiple alignments

Where are Families shown? Where are Families shown? FamilyViewFamilyView

Page 30: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Comparing Genomes Homologies and Families Sequence alignments

Page 31: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

• To identify homologous regions

• To spot trouble gene predictions

• Conserved regions could be functional

• To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)

Aligning Whole Genomes- Aligning Whole Genomes- Why?Why?

Page 32: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Should find all highly similar regions between two sequences

Should allow for segments without similarity, rearrangements etc.

Issues Heavy process Scalability, as more and more genomes are sequenced

Time constraint

Aligning large genomic sequencesAligning large genomic sequences

Page 33: Comparative Genomics

Enredo Defines orthology map (co-linear regions) Supports segmental duplications

Pecan Consistency based multiple aligner Optimized to cope with long DNA sequences

Ortheus Ancestral sequences reconstructor Inferring the history of insertion and deletions

Whole Genome Multiple AlignmentsWhole Genome Multiple Alignments

Page 34: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

In ContigView...In ContigView...

Page 35: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Currently 2 sets: 10 amniota vertebrates:

7 eutherian mammals:

Multiple Alignments using Multiple Alignments using PECANPECAN

To come… the fish!

Page 36: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Use all coding exons Get sets of best reciprocal hits Create orthology maps

Use all coding exons Get sets of best reciprocal hits Create orthology maps Build multiple global alignments

Alignment StrategyAlignment Strategy

Use all coding exons Use all coding exons Get sets of best reciprocal hits

Page 37: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

View Alignments: ContigViewView Alignments: ContigView

In the Detailed View Panel:

Page 38: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

View Conservation: ContigViewView Conservation: ContigView

Click on a Pink Bar for Click on a Pink Bar for AlignSliceViewAlignSliceView… export alignments… export alignments

Page 39: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

AlignSliceViewAlignSliceView

Page 40: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

GeneSeqalignViewGeneSeqalignView

Page 41: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

GeneSeqalignViewGeneSeqalignView

Page 42: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

MultiContigViewMultiContigView

Comparison of chromosomes in Comparison of chromosomes in multiple species.multiple species.

(Links from (Links from SyntenyViewSyntenyView, , ContigView, CytoViewContigView, CytoView))

Page 43: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Export Alignments in Export Alignments in BioMartBioMart

Choose ‘Compara pairwise alignments’

Page 44: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Syntenic RegionsSyntenic Regions

Genome alignments are compiled into larger syntenic regions

Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent

Any clusters less than 100 kb are discarded

Page 45: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

EnredoEnredo

Anchors

500.000 anchorsfor mammals

---more than 1 anchor

per 10Kb

Supports segmentalduplications!!

Covers 90% of the humanprotein coding genes

(Hsap-Mmus-Rnor-Cfam-Btau)

Page 46: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

SyntenyViewSyntenyView Human chromosome

Mouse chromosomes

Mouse chromosomes

Orthologues

Page 47: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

Syntenic blocks

CytoViewCytoView

Page 48: Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors

SummarySummary

View Homology in pages such as GeneView, ProtView, SyntenyView, GeneTreeView, or BioMart

View Protein Family information in FamilyView

View Alignments in ContigView, GeneSeqAlign View, through BioMart