Upload
vuongduong
View
222
Download
0
Embed Size (px)
Citation preview
1
Major role of positive selection in evolution of conservative
segments of Drosophila proteins
Georgii A. Bazykin, Alexey S. Kondrashov
Electronic supplementary information
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 3 6 9 12 15 18 21
Variable amino acid positions in window of 21
α
81 (50%)73 (45%)65 (40%)57 (35%)49 (30%)41 (25%)32 (20%)24 (15%)16 (10%)8 (5%)
Supporting Figure 1. The fraction of the positively selected coding sites α,
estimated from the data on variation among 162 individuals of D. melanogaster
and divergence from the D. yakuba-D. erecta common ancestor. The sites were
subdivided into 22 classes of different conservatism of the protein segments that
contain them, in the alignment of their orthologs in 7 more distant Drosophila
species. The values of α for different cut-off frequency thresholds in
D. melanogaster are presented. Each line corresponds to a particular threshold
number of individuals out of 162, and the corresponding threshold allele frequency
(in brackets), such that the polymorphisms at frequencies below this number were
excluded from analysis. The values for the 24 (15%) frequency threshold
correspond to those in Figure 1C.
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 3 6 9 12 15 18 21
Variable amino acid positions in window of 21
pN/dN
pS/dS
Pooled pS/dS
α
α from pooledpS/dS
Supporting Figure 2. Dependence of α on the conservatism for different methods
of calculating the rates of synonymous substitutions and polymorphisms in
D. melanogaster. The sites were subdivided into 22 classes of different
conservatism of the protein segments that contain them, in the alignment of their
orthologs in 7 more distant Drosophila species. Green diamonds, ratio of
nonsynonymous polymorphism and divergence (pN/dN); red squares, ratio of
synonymous polymorphism and divergence (pS/dS); blue solid line, ratio of
synonymous polymorphism and divergence (pS/dS) obtained by pooling all sites;
red triangles, α obtained using bin-specific values of pS and dS (same as in Figure
1C); blue triangles, α obtained using pooled values of pS and dS.
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 3 6 9 12 15 18 21
Variable amino acid positions in window of 21
pN/dN
pS/dS
Pooled pS/dS
α
α from pooledpS/dS
Supporting Figure 3. Dependence of α on the conservatism for different methods
of calculating the rates of synonymous substitutions and polymorphisms in
D. simulans. The sites were subdivided into 22 classes of different conservatism of
the protein segments that contain them, in the alignment of their orthologs in 7
more distant Drosophila species. Green diamonds, ratio of nonsynonymous
polymorphism and divergence (pN/dN); red squares, ratio of synonymous
polymorphism and divergence (pS/dS); blue solid line, ratio of synonymous
polymorphism and divergence (pS/dS) obtained by pooling all sites; red triangles, α
obtained using bin-specific values of pS and dS (same as in Figure 1F); blue
triangles, α obtained using pooled values of pS and dS.
5
0 3 6 9 12 15 18 210
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
α
ωa
Variable amino acid positions in window of 21
Supporting Figure 4. Results of the McDonald-Kreitman test for the different
classes of conservatism in D. melanogaster. This figure is based on the same data
as Figure 1A-C, but the method of Eyre-Walker and Keightley (2009) was used to
estimate α and ωa.
6
Supporting Figure 5. Results of the McDonald-Kreitman test for the different
classes of conservatism in D. melanogaster. This figure is analogous to Figure 1A-
C, except that none of the data filtering steps listed in Table S1 were applied.
7
Supporting Figure 6. Results of the McDonald-Kreitman test for the different
classes of conservatism in D. melanogaster. This figure is analogous to Figure 1A-
C, except that the D. yakuba–D. erecta codon match was not required here.
8
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
0 3 6 9 12 15 18 21
Variable amino acid positions in window of 21
Sit
es
Non-degenerate4-fold degenerateOther
Supporting Figure 7. Numbers of non-degenerate nonsynonymous, 4-fold
degenerate synonymous, and other nucleotide sites in each bin of conservatism in
D. melanogaster, within the 3,674,790 analysed codon sites.
9
0
100,000
200,000
300,000
400,000
500,000
600,000
0 3 6 9 12 15 18 21
Variable amino acid positions in window of 21
Sit
es
Non-degenerate4-fold degenerateOther
Supporting Figure 8. Numbers of non-degenerate nonsynonymous, 4-fold
degenerate synonymous, and other nucleotide sites in each bin of conservatism in
D. simulans, within the 2,918,085 analysed codon sites.
10
0
500
1,000
1,500
2,000
2,500
0 2 4 6 8 10 12 14 16 18 20
Variable amino acid positions in window of 20
Cas
es
Supporting Figure 9. Numbers of codons with double nonsynonymous
substitutions between D. melanogaster and D. pseudoobscura for which the
pattern of substitutions could be inferred. For each class of conservatism, the gray
part of the bar corresponds to the pairs of substitutions that occurred in different
lineages, and the white part, to the pairs of substitutions that occurred in the same
lineage; Figure 2 of the main text (gray bars) shows the ratios of these two values.
11
Supporting Table 1. Numbers of codon sites, and of nucleotide sites of each
degeneracy, after application of each of the data quality filters in the analysis of
variation in D. melanogaster. The number in the last line corresponds to the sites
used in the MK test.
Requirements to codon site, cumulative Codon
sites
Nucleotide sites
Non-
degenerate
4-fold
degenerate
Other
Codon present in reference
D. melanogaster genome 7,214,114 12,903,045 3,388,362 5,350,911
Codon present in at least 81 non-reference
D. melanogaster genomes, and the
plurality consensus of 162
D. melanogaster genomes is a valid
codon 7,125,076 12,742,599 3,346,127 5,286,502
Valid codon present in each of the 12
Drosophila species 5,300,830 9,469,067 2,457,292 3,976,131
Codon not masked by RepeatMasker in
each of the 12 Drosophila species 5,084,422 9,066,508 2,356,065 3,830,693
Not among the 10 codon sites on the 5’ end
of the gene, nor the 10 codon sites on
the 3’ end of the gene 4,937,169 8,813,103 2,295,591 3,702,813
Codon matches between D. yakuba and
D. erecta 4,145,427 7,418,058 1,852,324 3,165,899
Codon matches between D. simulans,
D. sechellia and at least one of the non-
reference D. melanogaster 3,674,790 6,559,121 1,578,073 2,887,176
12
Supporting Table 2. Numbers of codon sites, and of nucleotide sites of each
degeneracy, after application of each of the data quality filters in the analysis of
variation in D. simulans. The number in the last line corresponds to the sites used
in the MK test.
Requirements to codon site, cumulative Codon
sites
Nucleotide sites
Non-
degenerate
4-fold
degenerate
Other
Codon present in reference
D. melanogaster genome 7,133,952 12,758,905 3,352,413 5,290,538
Codon present in at least 3 non-reference
D. simulans genomes, and the plurality
consensus of 6 D. simulans genomes is a
valid codon 5,100,067 9,148,117 2,356,814 3,795,270
Valid codon present and not masked by
RepeatMasker in each of the 12
Drosophila species 3,840,834 6,873,503 1,756,007 2,892,992
Not among the 10 codon sites on the 5’ end
of the gene, nor the 10 codon sites on
the 3’ end of the gene 3,722,688 6,669,647 1,708,006 2,790,411
Codon matches between D. yakuba and
D. erecta 3,133,805 5,628,459 1,381,877 2,391,079
Codon matches between D. melanogaster
and at least one of the non-reference
D. simulans 2,918,085 5,229,361 1,254,158 2,270,736