Upload
ivrit
View
31
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006. % of alternatively splic ed human and mouse genes by year of publication. - PowerPoint PPT Presentation
Citation preview
Alternative splicing: A playground of evolution
Mikhail Gelfand
Research and Training Center for BioinformaticsInstitute for Information Transmission Problems RAS,
Moscow, Russia
October 2006
% of alternatively spliced human and mouse genes by year of publication
Human (genome / random sample)
Human (individual chromosomes)
Mouse (genome / random sample)
All genes
Only multiexon genes
Genes with high EST coverage
• Evolution of alternative exon-intron structure – mammals: human, mouse, dog– dipteran insects: Drosophila melanogaster,
D. pseudoobscura, Anopheles gambiae• Evolutionary rate in constitutive and
alternative regions– human / mouse– D. melanogaster / D. pseudoobscura– human-chimpanzee / human SNPs
Plan
Elementary alternatives
Cassette exon
Alternative donor site
Alternative acceptor site
Retained intron
Alternative exon-intron structure in the human, mouse and dog genomes
• EDAS: a database of human alternative splicing (human genome + GenBank + EST data from RefSeq)– consider casette exons and alternative splicing sites– functionality: potentially translated vs. NMD-inducing elementary
alternatives
• Human-mouse-dog triples of orthologous genes• We follow the fate of human alternative sites and exons in the
mouse and dog genomes• Each human AS isoform is spliced-aligned to the mouse and
dog genome. Definition of conservation:– conservation of the corresponding region (homologous exon is actually
present in the considered genome);– conservation of splicing sites (GT and AG)
Caveats
• we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes
• we do not consider situations when alternative human exon (or site) is constitutive in mouse or dog
• of course, functionality assignments (translated / NMD-inducing) are not very reliable
Translated cassette exons
constitutive
NMD-inducing cassette exons
Observations
• Predominantly included exons are highly conserved irrespective of function
• Predominantly skipped translated exons are more conserved than NMD-inducing ones
• Numerous lineage-specific losses
– more in mouse than in dog
• Still, ~40% of skipped (<1% inclusion) exons are conserved in at least one lineage
Alternative donor and acceptor sites: same trends
• Higher conservation of ~uniformly used sites• Internal sites are more conserved than external ones (as expected)
Alternative exon-intron structure in fruit flies and the malarial mosquito
• Same procedure (AS data from FlyBase)– cassette exons, splicing sites
– also mutually exclusive exons, retained introns
• Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes
• Technically more difficult:– incomplete genomes
– the quality of alignment with the Anopheles genome is lower
– frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)
Conservation of coding segments
constitutive segments
alternative segments
D. melanogaster – D. pseudoobscura 97% 75-80%
D. melanogaster – Anopheles gambiae 77% ~45%
Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes
blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved
• retained introns are the least conserved (are all of them really functional?)
• mutually exclusive exons are as conserved as constitutive exons
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CONSTANTexon
Donor site Acceptor site Retained intron Cassette exon Exclusive exon
Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes
blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved
• ~30% joined, ~10% divided exons (less introns in Aga)
• mutually exclusive exons are conserved exactly
• cassette exons are the least conserved
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CONSTANTexon
Donor site Acceptor site Retained intron Cassette exon Exclusive exon
CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles
Dme, Dps
Aga
a)
CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site
in Anopheles
Dme, Dps
Aga
Evolutionary rate in constitutive and alternative regions
• Human and mouse orthologous genes
• Estimation of the dn/ds ratio: higher fraction of non-synonymous (changing amino acid) substitutions=> weaker stabilizing (or stronger positive) selection
Concatenates of constitutive and alternative regions in all genes: different evolutionary rates
Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end
0,1760,199
0,187
0,301
0,00
0,10
0,20
0,30
Constitutive N-endalternative
Internalalternative
C-endalternative
d N/dS
0,886 0,874 0,878
0,807
0,7
0,8
0,9
Constitutive N-endalternative
Internalalternative
C-endalternative
Am
ino-
acid
iden
tity
• Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio)
• Less amino acid identity in alternative regions
Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger
in alternative regions (vertical acis) than in constitutive regions (horizontal acis)
0 .0 0 1 0 .0 1 0 .1 1 1 00 .0 0 1
0 .0 1
0 .1
1
1 0
С
A
Non-symmetrical histogram of dn/ds(const)–dn/ds(alt)
1 5
35
9 1 0
1 8
4 06 7
1 3 6
3 2 9
7 5 2 6 4 2
1 9 9
7 3
2 71 8
7 7
01 0 01
1 0
1 0 0
1 0 0 0
– 1 – 0 .9– – 0 .8 – 0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
G e n es
C– A
Black: shadow of the left half.In a larger fraction of genes dn/ds(const)<dn/ds(alt), especially for larger values
The same effect is seen in:
N-terminal,
internal,
C-terminal parts
1
10
1 0 0
1 0 0 0
– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
7
32
7 81 2
324 9
111
2 3 3
5 2 6 4 6 9
1 2 7
5 2
2 1
6
35
1 0 0 0
1
10
1 0 0
1 0 0 0
– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
9
4 3 3
61 0
1 72 3
5 1
1 0 9
2 8 5 2 8 3
1 0 1
3 0
1 51 0
5
2 0 0 0 0
1
10
1 0 0
1 0 0 0
– 1 – 0 .9– – 0 .8 –0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
30 01 1
43
6
1 42 0 1 9 2 3
9
4
20 0 1 0 0 0 0
C– A N
C – A I
C – A C
G en es
Drosophilas: less selection in alternative regions?
More mutations in alt. regions
Similar level of mutations
More mutations in const. regions
In a majority of genes, both synonymous and non-synonymous mutation rates are higher in alternative regions than in constitutive regions
Different behavior of N-terminal, internal and C-terminal alternatives
N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions;most genes have higher stabilizing selection in alt. regions
Internal alternatives: intermediate situation
C-terminal alternatives: more non-synonymous substitutions and less synonymous substitutions => lower stabilizing selection in alternative regions
The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions• Human and chimpanzee genome mismatches vs human SNPs• Exons conserved in mouse and/or dog• Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance
Pn/Ps (SNPs) Dn/Ds (genomes) diff. Signif.Const. 0.72 0.62 – 0.10 0Major 0.78 0.65 – 0.13 0.5%Minor 1.41 1.89 + 0.48 0.1%
Minor isoform alternative regions:• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%• More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37%• Positive selection (as opposed to lower stabilizing selection):
α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions • Similar results for all highly covered genes or all conserved exons
An attempt of integration
• AS is often genome-specific• young AS isoforms are often minor and tissue-specific• … but still functional
– although unique isoforms may result from aberrant splicing
• AS regions show evidence for decreased negative selection– excess non-synonymous codon substitutions
• AS regions show evidence for positive selection – excess non-synonymous SNPs
• AS tends to shuffle domains and target functional sites in proteins
• Thus AS may serve as a testing ground for new functions without sacrificing old ones
What next?• Multiple genomes
– many Drosophila spp.– ENCODE data for many mammals
• Estimate not only the rate of loss, but also the rate of gain (as opposed to aberrant splicing)
• Control for:– functionality: translated / NMD-inducing– exon inclusion (or site choice) level: major / minor isoform– tissue specificity pattern (?)– type of alternative: N-terminal / internal / C-terminal
• Evolution of regulation of AS• Splicing errors and mutations:
retained introns, skipped exons, cryptic sites
Acknowledgements
• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)– Shamil Sunyaev (Harvard University Medical School)
• Data– King Jordan (NCBI)
• Support– Howard Hughes Medical Institute– INTAS– Russian Academy of Sciences
(program “Molecular and Cellular Biology”)– Russian Fund of Basic Research
Authors• Andrei Mironov (Moscow State University)
• Ramil Nurtdinov (Moscow State University) – human/mouse/dog
• Dmitry Malko (GosNIIGenetika) – drosophila/mosquito
• Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks
• Vasily Ramensky (Institute of Molecular Biology) – SNPs
• Irena Artamonova (GSF/MIPS) – human/mouse, plots
• Alexei Neverov (GosNIIGenetika) – functionality of isoforms