47
The impact of whole genome duplications: insights from Paramecium tetraurelia

The impact of whole genome duplications: insights from Paramecium tetraurelia

  • Upload
    lavada

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

The impact of whole genome duplications: insights from Paramecium tetraurelia. Ab initio gene predictions Comparative approach 90,000 ESTs. Genome Annotation. Protein-coding regions: 78% of the genome Short intergenic regions Average = 352 bp Introns: Short (average = 25 bp) … - PowerPoint PPT Presentation

Citation preview

Page 1: The impact of whole genome duplications: insights from  Paramecium tetraurelia

The impact of whole genome duplications: insights from Paramecium tetraurelia

Page 2: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Genome Annotation

• Ab initio gene predictions

• Comparative approach

• 90,000 ESTs

Page 3: The impact of whole genome duplications: insights from  Paramecium tetraurelia

A compact Mac genome

• Protein-coding regions: 78% of the genome

• Short intergenic regions Average = 352 bp

• Introns:Short (average = 25 bp) …

… but numerous : 80% of genes contain introns (average = 2.9 introns / gene)

Page 4: The impact of whole genome duplications: insights from  Paramecium tetraurelia

39642 annotated genes

Gene content

2000

6000

10000

12500

900011200

5200

40000

14000

24000

27900

24000

2690028000

37500

05000

10000

15000200002500030000

350004000045000

E. cunic

uli

S. cer

evis

iae

N. cra

ssa

D. dis

coid

eum

T. bru

cei

T. pse

udonana

P. fal

cipar

um

P. tet

raure

lia

11000

C. inte

stin

alis

D. mel

anogas

ter

20600

C. ele

gans

X. tro

pical

is

T. nig

rovi

dis

H. sap

iens

M. m

usculu

s

A. thal

iana

O. sat

iva

Num

ber

of g

enes

Not due to annotation artefacts (control with cDNA data, distribution of protein length, manual curation on chrom. 1, …)

39642

Page 5: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Many genes belong to multigenic families

Computing Best Reciprocal Hits (BRH) within Paramecium proteins

SW comparisons+

filtering

39 642 proteins

13 085 pairs of proteins in BRH

Page 6: The impact of whole genome duplications: insights from  Paramecium tetraurelia

BRH are found in large duplicated blocs (paralogons). Example: scaffold 1 & 8

Page 7: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Building paralogons

• Using a sliding window of size w genes

• For each window : – Select a paralogous region if at least p % of w genes are BRH

with the sequence

• Merging overlapping windows

• Add syntenic genes which do not have BRH

Page 8: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Whole genome duplication (WGD)

Settings :

W = 10p = 61%

Coverage :

61.3 Mb (85%)35 503 genes (90%)

Résults :

24 052 genes in 2 copies (68%)

11 451 genes in 1 copie (32%)

51% of ancestral genes are still in 2 copies

Page 9: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Progressive loss of gene duplicates

• ~1500 recent pseudogenes (recognizable)• Length distribution of genic and intergenic sequences : relics of more

ancient pseudogenes in intergenic regions

Single-copy geneIntergenic region encompassing a gene lossOther intergenic regions

Sequence length (bp)

Freq

uenc

y (%

)

Page 10: The impact of whole genome duplications: insights from  Paramecium tetraurelia

BRH from supercontig 8

Number of BRH (>3000) remains outside of paralogons

Page 11: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Paralogous genes

Inferring ancestral blocs

Arbitrary order

Ancestral blocs

Building paralogons with 131 ancestral blocs

Page 12: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Intermediary WGDSettings :

W = 10p = 40%

Coverage :

31,129 genes (79%)

Content before WGD : 20,578 genes

7 996 genes in 2 copies (39%)

12 582 genes in 1 copy (61%)

Page 13: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Old WGDSettings :

W = 20p = 30%

Coverage :

18,792 genes (47%)

Content before WGD : 9,999 genes

1 530 genes in 2 copies (15%)

8 469 genes in 1 copy (85%)

Page 14: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Gene content at each WGD

19 552 genes

21 172

26 214

39 642

Old WGD

Intermediary WGD

Recent WGD

x 1.1

x 1.2

x 1.5

x 2 (not x 8)

Page 15: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Protein sequence similarity between duplicates (ohnologs)

Old WGDIntermediary WGDRecent WGD

Page 16: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Distribution of the rate of synonymous substitution (dS) between ohnologs

Old WGD

Intermediary WGD

Recent WGD

dS computed with PAML

saturation

Recent gene conversion

Page 17: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Recent WGD

dN/dS

Freq

uenc

y (%

)

Distribution of dN/dS

• => both ohnologs are under strong negative selective pressure• Yet … the fate of most ohnologs is to be pseudogenized !• => gene-silencing mutations can be tolerated …• … but deleterious mutations affecting the coding sequence of one copy are

counterselected (i.e. dominant effect of mutations, despite the presence of a duplicate)

• Once a gene has been silenced (e.g. by mutation of regulatory elements), mutations can accumulate in coding regions

Page 18: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Gene duplicates are evolutionarily unstable

Gene duplication

...Time

Pseudogene

Ancient paralogsSelective pressure to maintain 2 copies

Page 19: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Retention of gene duplicates

• Different (non-exclusive) models have been proposed for the retention of gene duplicates:– Robustness against mutations– Functional changes: neo- or sub-functionalization– Dosage constraints

• Which are the genes that are preferentially retained after a WGD ?

• How does the pattern of gene retention vary with time ?– Compare the pattern of retention after a recent WGD and a

more ancient WGD – Paramecium: 3 successive WGDs !

Page 20: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Mutational robustness

• Under certain conditions (high mutation rate and very large population size) redundant genes may be maintained by selection acting against double null alleles (Force et al. 1999)

• Essential genes (e.g. ribosomal proteins) are more retained than the average

• … but most of them are present in more than 2 copies !

• … their high rate of retention may be due to other factors (see later)

Page 21: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Functional changes

...Time

Function: F

Function: F Function: F’

Neofunctionalization(adaptation)

Subfunctionalization(neutral evolution)

...

Function: F1F2

Function: F1 Function: F2

Functional changes:- changes in gene expression pattern- changes in the encoded protein

Force et al. (1999)

Page 22: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Prediction of the subfunctionalization model

• A gene that has been preserved by subfunctionalization at a given WGD, is less likely to be retained in two copies at a subsequent WGD (Force et al. 1999)

F1F2

F1F2

F1 F2

WGD1

WGD2

F1F2

F1 F2

F1 F2

WGD1

WGD2

Page 23: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Test of the subfunctionalization model (1)

• Apparent contradiction with the subfunctionalization model• Due to variations in retention rate between different

functional classes ?

Intermediate WGD

Retained: 47% Retained: 57%

Retention at the recent WGD ?

N=7,996 N=12,582

Page 24: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Test of the subfunctionalization model (2)

• A gene that has been preserved at a given WGD, is less likely to be retained in two copies at a subsequent WGD

• Difference significant (p<5%), but not very strong• Subfunctionalization is an unlikely evolutionary pathway in species with large population

sizes (Lynch 2005)

Old WGD

Intermediate WGD

Retention at the recent WGD ?

N = 343 gene families

Retained: 67% Retained: 60%

Page 25: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Test of the neofunctionalization model

• Analysis of gene expression (work in progress)• Analysis of the rate of protein evolution:

Outgroup (function F)

Ohnolog 1 (function F) Ohnolog 2 (function F’)

• Relative rate test (PAML); correction for multiple tests• Frequency of ohnologs with asymetric substitution rates:

– Recent WGD (N=2297) : 11%– Intermediate WGD (N=293 ) : 16%

• More functional redundancy among recent duplicates• Functional changes account for retention on the long term

Page 26: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Fate of neofunctionalized genes at subsequent WGD

Intermediate WGD

Slow copy: 66% retained Fast copy: 26% retained

Retention at the recent WGD ?

Neofunctionalized genes are more prone to pseudogenization at subsequent WGD

N = 62

Page 27: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Retention for dosage constraints (1): high expression level

• Genes that have to be expressed at very high level are often present in multiple copies (e.g. histones)

• The loss of one copy is counterselected because it cannot be compensated for by the upregulation of other copies

• => More retention among highly expressed genes

Page 28: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Retention rates

For each WGD, the retention rate for a given gene category is :

Proportion of genes retained in duplicates in this categoryRatio =

Proportion of total genes retained in duplicates

Ratio = 1 no specific retention above the mean value for all genes

Ratio > 1 over-retained category

Ratio < 1 under-retained category

Page 29: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Expression versus Retention

Page 30: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Retention for dosage constraints (2): the balance hypothesis (Papp et al.

2003)

• The relative expression levels of proteins involved in a same functional network have to be controled to ensure the proper stoichiometry of the network

• Initially, the loss of one copy is counterselected because it creates an imbalance within the network

• On the long term, gene losses may occur because they can be compensated for by the upregulation of other copies

Page 31: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Testing the balance hypothesis (1):Genes involved in multi-protein

complexes

• Protein complexes predicted by homology with yeast:

– MIPS database (curation from the litterature)– TAP / MS data (Gavin et al. Nature 2006)

Page 32: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Multi-protein complexes

Genes involved in the coding of protein complexes are initially over-retained

Page 33: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Additive effects of Expression and Inclusion in Complex

Page 34: The impact of whole genome duplications: insights from  Paramecium tetraurelia

• Proteins involved in complexes are over-retained at the recent WGD

• Does this mean that complex stoichiometry tends to be conserved ?

Page 35: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Constraint of stoichiometry and fate of duplicates

Complexes p-valuewith conserved stoichiometry

Recent WGD 265 (44%) 2.6x10-2

74 (68%) 4.3x10-4

Intermediary WGD 114 (20%) 1.5x10-3

43 (43%) 2.4x10-4

Old WGD 106 (24%) 1.2x10-5

26 (43%) 2.5x10-3

MIPS complexesComplexes from Gavin et al. Nature 2006

Number of copy of A

Number of copy of B

complex

A

B

Page 36: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Testing the balance hypothesis (2): genes involved in central

metabolism

Page 37: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Retention of central metabolism gene

duplicates

Genes involved in the central metabolism are initially over-retained and then under-retained (less neofunctionalization ?)

Page 38: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Dating genome duplications

• Phylogenetic analyses of orthologous genes in other ciliate species => date WGDs relative to speciation events

Page 39: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Tetrahymena thermophila

P. putrinumP. bursaria

P. polycaryum

P. nephridiatum

P. duboscqui

P. multimicronucleatum

P. caudatum

P. tetraurelia

P. pentaurelia

P. primaurelia

P. sexaurelia

P. jenningsi

P. octaurelia

P. novaurelia

P. tredecaurelia

P. quadecaurelia

Paramecium aurelia complex

Intermediate WGD

Old WGD

Recent WGD

Complex aurelia: 15 sibling species (same kind of habitat, initially thought to correspond to a single species)

Page 40: The impact of whole genome duplications: insights from  Paramecium tetraurelia

How does WGD

relate to speciation?

Page 41: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Ptetra

Pprim

With the kind permission of K. Wolfe

Polyploid paramecia

Page 42: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Ptetra

Pprim

Polyploid paramecia

Mating,meiosis

Page 43: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Dobzhansky-Muller incompatibility by reciprocal gene loss

For 1 locus, 1/4 of the offspring is inviable.

For n loci, offspring viability is (3/4)n

Reproductive isolation

Page 44: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Conclusions (1)

• At least 3 WGDs in paramecium (probably 4)• WGDs are rare events … that occured

recurrently in the evolution of eukaryotes (fungi, animals, plants, ciliates …)

• Major impact on the evolution of the gene repertoire

Page 45: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Conclusions (2)

• Dosage constraints appear as an essential force shaping the gene repertoire after WGD

• Functional changes contribute to gene retention on the long term …

• … but the fate of the vast majority of genes is to get pseudogenized

Page 46: The impact of whole genome duplications: insights from  Paramecium tetraurelia

Conclusions (3)

• Relationship between the number of genes and organism complexity– The number of genes is driven by selection …– … and contingency (time since the last WGD)

• WGDs may be reponsible for (non-adaptative) explosive radiation of species (Dobzhansky-Muller incompatibility by reciprocal gene loss)

Page 47: The impact of whole genome duplications: insights from  Paramecium tetraurelia

• CNRS-UPR2167 - CGM - Gif sur Yvette– Jean Cohen– Linda Sperling

• CNRS-UMR8541 – ENS - Paris – Eric Meyer– Mireille Bétermier

• CNRS-UMR8125 – IGR - Villejuif – Philippe Dessen

• CNRS-UMR5558 – PBIL - Lyon– Laurent Duret– Vincent Daubin

• Genoscope - CNRS UMR 8030– Jean-Marc Aury– Olivier Jaillon– Benjamin Noel– Betina Porcel– Vincent Schachter– Patrick Wincker– Jean Weissenbach