28
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006

Alternative splicing: A playground of evolution

  • Upload
    ivrit

  • View
    31

  • Download
    3

Embed Size (px)

DESCRIPTION

Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006. % of alternatively splic ed human and mouse genes by year of publication. - PowerPoint PPT Presentation

Citation preview

Page 1: Alternative splicing:  A playground of evolution

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for BioinformaticsInstitute for Information Transmission Problems RAS,

Moscow, Russia

October 2006

Page 2: Alternative splicing:  A playground of evolution

% of alternatively spliced human and mouse genes by year of publication

Human (genome / random sample)

Human (individual chromosomes)

Mouse (genome / random sample)

All genes

Only multiexon genes

Genes with high EST coverage

Page 3: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – mammals: human, mouse, dog– dipteran insects: Drosophila melanogaster,

D. pseudoobscura, Anopheles gambiae• Evolutionary rate in constitutive and

alternative regions– human / mouse– D. melanogaster / D. pseudoobscura– human-chimpanzee / human SNPs

Plan

Page 4: Alternative splicing:  A playground of evolution

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Page 5: Alternative splicing:  A playground of evolution

Alternative exon-intron structure in the human, mouse and dog genomes

• EDAS: a database of human alternative splicing (human genome + GenBank + EST data from RefSeq)– consider casette exons and alternative splicing sites– functionality: potentially translated vs. NMD-inducing elementary

alternatives

• Human-mouse-dog triples of orthologous genes• We follow the fate of human alternative sites and exons in the

mouse and dog genomes• Each human AS isoform is spliced-aligned to the mouse and

dog genome. Definition of conservation:– conservation of the corresponding region (homologous exon is actually

present in the considered genome);– conservation of splicing sites (GT and AG)

Page 6: Alternative splicing:  A playground of evolution

Caveats

• we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes

• we do not consider situations when alternative human exon (or site) is constitutive in mouse or dog

• of course, functionality assignments (translated / NMD-inducing) are not very reliable

Page 7: Alternative splicing:  A playground of evolution

Translated cassette exons

constitutive

Page 8: Alternative splicing:  A playground of evolution

NMD-inducing cassette exons

Page 9: Alternative splicing:  A playground of evolution

Observations

• Predominantly included exons are highly conserved irrespective of function

• Predominantly skipped translated exons are more conserved than NMD-inducing ones

• Numerous lineage-specific losses

– more in mouse than in dog

• Still, ~40% of skipped (<1% inclusion) exons are conserved in at least one lineage

Page 10: Alternative splicing:  A playground of evolution

Alternative donor and acceptor sites: same trends

• Higher conservation of ~uniformly used sites• Internal sites are more conserved than external ones (as expected)

Page 11: Alternative splicing:  A playground of evolution

Alternative exon-intron structure in fruit flies and the malarial mosquito

• Same procedure (AS data from FlyBase)– cassette exons, splicing sites

– also mutually exclusive exons, retained introns

• Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes

• Technically more difficult:– incomplete genomes

– the quality of alignment with the Anopheles genome is lower

– frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Page 12: Alternative splicing:  A playground of evolution

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura 97% 75-80%

D. melanogaster – Anopheles gambiae 77% ~45%

Page 13: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved

• retained introns are the least conserved (are all of them really functional?)

• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 14: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved

• ~30% joined, ~10% divided exons (less introns in Aga)

• mutually exclusive exons are conserved exactly

• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 15: Alternative splicing:  A playground of evolution

CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles

Dme, Dps

Aga

a)

Page 16: Alternative splicing:  A playground of evolution

CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site

in Anopheles

Dme, Dps

Aga

Page 17: Alternative splicing:  A playground of evolution

Evolutionary rate in constitutive and alternative regions

• Human and mouse orthologous genes

• Estimation of the dn/ds ratio: higher fraction of non-synonymous (changing amino acid) substitutions=> weaker stabilizing (or stronger positive) selection

Page 18: Alternative splicing:  A playground of evolution

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates

Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end

0,1760,199

0,187

0,301

0,00

0,10

0,20

0,30

Constitutive N-endalternative

Internalalternative

C-endalternative

d N/dS

0,886 0,874 0,878

0,807

0,7

0,8

0,9

Constitutive N-endalternative

Internalalternative

C-endalternative

Am

ino-

acid

iden

tity

• Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio)

• Less amino acid identity in alternative regions

Page 19: Alternative splicing:  A playground of evolution

Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger

in alternative regions (vertical acis) than in constitutive regions (horizontal acis)

0 .0 0 1 0 .0 1 0 .1 1 1 00 .0 0 1

0 .0 1

0 .1

1

1 0

С

A

Page 20: Alternative splicing:  A playground of evolution

Non-symmetrical histogram of dn/ds(const)–dn/ds(alt)

1 5

35

9 1 0

1 8

4 06 7

1 3 6

3 2 9

7 5 2 6 4 2

1 9 9

7 3

2 71 8

7 7

01 0 01

1 0

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 – 0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

G e n es

C– A

Black: shadow of the left half.In a larger fraction of genes dn/ds(const)<dn/ds(alt), especially for larger values

Page 21: Alternative splicing:  A playground of evolution

The same effect is seen in:

N-terminal,

internal,

C-terminal parts

1

10

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

7

32

7 81 2

324 9

111

2 3 3

5 2 6 4 6 9

1 2 7

5 2

2 1

6

35

1 0 0 0

1

10

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

9

4 3 3

61 0

1 72 3

5 1

1 0 9

2 8 5 2 8 3

1 0 1

3 0

1 51 0

5

2 0 0 0 0

1

10

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

30 01 1

43

6

1 42 0 1 9 2 3

9

4

20 0 1 0 0 0 0

C– A N

C – A I

C – A C

G en es

Page 22: Alternative splicing:  A playground of evolution

Drosophilas: less selection in alternative regions?

More mutations in alt. regions

Similar level of mutations

More mutations in const. regions

In a majority of genes, both synonymous and non-synonymous mutation rates are higher in alternative regions than in constitutive regions

Page 23: Alternative splicing:  A playground of evolution

Different behavior of N-terminal, internal and C-terminal alternatives

N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions;most genes have higher stabilizing selection in alt. regions

Internal alternatives: intermediate situation

C-terminal alternatives: more non-synonymous substitutions and less synonymous substitutions => lower stabilizing selection in alternative regions

Page 24: Alternative splicing:  A playground of evolution

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions• Human and chimpanzee genome mismatches vs human SNPs• Exons conserved in mouse and/or dog• Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance

Pn/Ps (SNPs) Dn/Ds (genomes) diff. Signif.Const. 0.72 0.62 – 0.10 0Major 0.78 0.65 – 0.13 0.5%Minor 1.41 1.89 + 0.48 0.1%

Minor isoform alternative regions:• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%• More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37%• Positive selection (as opposed to lower stabilizing selection):

α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions • Similar results for all highly covered genes or all conserved exons

Page 25: Alternative splicing:  A playground of evolution

An attempt of integration

• AS is often genome-specific• young AS isoforms are often minor and tissue-specific• … but still functional

– although unique isoforms may result from aberrant splicing

• AS regions show evidence for decreased negative selection– excess non-synonymous codon substitutions

• AS regions show evidence for positive selection – excess non-synonymous SNPs

• AS tends to shuffle domains and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

Page 26: Alternative splicing:  A playground of evolution

What next?• Multiple genomes

– many Drosophila spp.– ENCODE data for many mammals

• Estimate not only the rate of loss, but also the rate of gain (as opposed to aberrant splicing)

• Control for:– functionality: translated / NMD-inducing– exon inclusion (or site choice) level: major / minor isoform– tissue specificity pattern (?)– type of alternative: N-terminal / internal / C-terminal

• Evolution of regulation of AS• Splicing errors and mutations:

retained introns, skipped exons, cryptic sites

Page 27: Alternative splicing:  A playground of evolution

Acknowledgements

• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)– Shamil Sunyaev (Harvard University Medical School)

• Data– King Jordan (NCBI)

• Support– Howard Hughes Medical Institute– INTAS– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Fund of Basic Research

Page 28: Alternative splicing:  A playground of evolution

Authors• Andrei Mironov (Moscow State University)

• Ramil Nurtdinov (Moscow State University) – human/mouse/dog

• Dmitry Malko (GosNIIGenetika) – drosophila/mosquito

• Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks

• Vasily Ramensky (Institute of Molecular Biology) – SNPs

• Irena Artamonova (GSF/MIPS) – human/mouse, plots

• Alexei Neverov (GosNIIGenetika) – functionality of isoforms