12
1 Major role of positive selection in evolution of conservative segments of Drosophila proteins Georgii A. Bazykin, Alexey S. Kondrashov Electronic supplementary information

Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

Embed Size (px)

Citation preview

Page 1: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

1

Major role of positive selection in evolution of conservative

segments of Drosophila proteins

Georgii A. Bazykin, Alexey S. Kondrashov

Electronic supplementary information

Page 2: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 3 6 9 12 15 18 21

Variable amino acid positions in window of 21

α

81 (50%)73 (45%)65 (40%)57 (35%)49 (30%)41 (25%)32 (20%)24 (15%)16 (10%)8 (5%)

Supporting Figure 1. The fraction of the positively selected coding sites α,

estimated from the data on variation among 162 individuals of D. melanogaster

and divergence from the D. yakuba-D. erecta common ancestor. The sites were

subdivided into 22 classes of different conservatism of the protein segments that

contain them, in the alignment of their orthologs in 7 more distant Drosophila

species. The values of α for different cut-off frequency thresholds in

D. melanogaster are presented. Each line corresponds to a particular threshold

number of individuals out of 162, and the corresponding threshold allele frequency

(in brackets), such that the polymorphisms at frequencies below this number were

excluded from analysis. The values for the 24 (15%) frequency threshold

correspond to those in Figure 1C.

Page 3: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 3 6 9 12 15 18 21

Variable amino acid positions in window of 21

pN/dN

pS/dS

Pooled pS/dS

α

α from pooledpS/dS

Supporting Figure 2. Dependence of α on the conservatism for different methods

of calculating the rates of synonymous substitutions and polymorphisms in

D. melanogaster. The sites were subdivided into 22 classes of different

conservatism of the protein segments that contain them, in the alignment of their

orthologs in 7 more distant Drosophila species. Green diamonds, ratio of

nonsynonymous polymorphism and divergence (pN/dN); red squares, ratio of

synonymous polymorphism and divergence (pS/dS); blue solid line, ratio of

synonymous polymorphism and divergence (pS/dS) obtained by pooling all sites;

red triangles, α obtained using bin-specific values of pS and dS (same as in Figure

1C); blue triangles, α obtained using pooled values of pS and dS.

Page 4: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 3 6 9 12 15 18 21

Variable amino acid positions in window of 21

pN/dN

pS/dS

Pooled pS/dS

α

α from pooledpS/dS

Supporting Figure 3. Dependence of α on the conservatism for different methods

of calculating the rates of synonymous substitutions and polymorphisms in

D. simulans. The sites were subdivided into 22 classes of different conservatism of

the protein segments that contain them, in the alignment of their orthologs in 7

more distant Drosophila species. Green diamonds, ratio of nonsynonymous

polymorphism and divergence (pN/dN); red squares, ratio of synonymous

polymorphism and divergence (pS/dS); blue solid line, ratio of synonymous

polymorphism and divergence (pS/dS) obtained by pooling all sites; red triangles, α

obtained using bin-specific values of pS and dS (same as in Figure 1F); blue

triangles, α obtained using pooled values of pS and dS.

Page 5: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

5

0 3 6 9 12 15 18 210

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

α

ωa

Variable amino acid positions in window of 21

Supporting Figure 4. Results of the McDonald-Kreitman test for the different

classes of conservatism in D. melanogaster. This figure is based on the same data

as Figure 1A-C, but the method of Eyre-Walker and Keightley (2009) was used to

estimate α and ωa.

Page 6: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

6

Supporting Figure 5. Results of the McDonald-Kreitman test for the different

classes of conservatism in D. melanogaster. This figure is analogous to Figure 1A-

C, except that none of the data filtering steps listed in Table S1 were applied.

Page 7: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

7

Supporting Figure 6. Results of the McDonald-Kreitman test for the different

classes of conservatism in D. melanogaster. This figure is analogous to Figure 1A-

C, except that the D. yakuba–D. erecta codon match was not required here.

Page 8: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

8

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

0 3 6 9 12 15 18 21

Variable amino acid positions in window of 21

Sit

es

Non-degenerate4-fold degenerateOther

Supporting Figure 7. Numbers of non-degenerate nonsynonymous, 4-fold

degenerate synonymous, and other nucleotide sites in each bin of conservatism in

D. melanogaster, within the 3,674,790 analysed codon sites.

Page 9: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

9

0

100,000

200,000

300,000

400,000

500,000

600,000

0 3 6 9 12 15 18 21

Variable amino acid positions in window of 21

Sit

es

Non-degenerate4-fold degenerateOther

Supporting Figure 8. Numbers of non-degenerate nonsynonymous, 4-fold

degenerate synonymous, and other nucleotide sites in each bin of conservatism in

D. simulans, within the 2,918,085 analysed codon sites.

Page 10: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

10

0

500

1,000

1,500

2,000

2,500

0 2 4 6 8 10 12 14 16 18 20

Variable amino acid positions in window of 20

Cas

es

Supporting Figure 9. Numbers of codons with double nonsynonymous

substitutions between D. melanogaster and D. pseudoobscura for which the

pattern of substitutions could be inferred. For each class of conservatism, the gray

part of the bar corresponds to the pairs of substitutions that occurred in different

lineages, and the white part, to the pairs of substitutions that occurred in the same

lineage; Figure 2 of the main text (gray bars) shows the ratios of these two values.

Page 11: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

11

Supporting Table 1. Numbers of codon sites, and of nucleotide sites of each

degeneracy, after application of each of the data quality filters in the analysis of

variation in D. melanogaster. The number in the last line corresponds to the sites

used in the MK test.

Requirements to codon site, cumulative Codon

sites

Nucleotide sites

Non-

degenerate

4-fold

degenerate

Other

Codon present in reference

D. melanogaster genome 7,214,114 12,903,045 3,388,362 5,350,911

Codon present in at least 81 non-reference

D. melanogaster genomes, and the

plurality consensus of 162

D. melanogaster genomes is a valid

codon 7,125,076 12,742,599 3,346,127 5,286,502

Valid codon present in each of the 12

Drosophila species 5,300,830 9,469,067 2,457,292 3,976,131

Codon not masked by RepeatMasker in

each of the 12 Drosophila species 5,084,422 9,066,508 2,356,065 3,830,693

Not among the 10 codon sites on the 5’ end

of the gene, nor the 10 codon sites on

the 3’ end of the gene 4,937,169 8,813,103 2,295,591 3,702,813

Codon matches between D. yakuba and

D. erecta 4,145,427 7,418,058 1,852,324 3,165,899

Codon matches between D. simulans,

D. sechellia and at least one of the non-

reference D. melanogaster 3,674,790 6,559,121 1,578,073 2,887,176

Page 12: Major role of positive selection in evolution of ...rspb.royalsocietypublishing.org/content/suppl/2012/06/06/rspb.2012... · Major role of positive selection in evolution of conservative

12

Supporting Table 2. Numbers of codon sites, and of nucleotide sites of each

degeneracy, after application of each of the data quality filters in the analysis of

variation in D. simulans. The number in the last line corresponds to the sites used

in the MK test.

Requirements to codon site, cumulative Codon

sites

Nucleotide sites

Non-

degenerate

4-fold

degenerate

Other

Codon present in reference

D. melanogaster genome 7,133,952 12,758,905 3,352,413 5,290,538

Codon present in at least 3 non-reference

D. simulans genomes, and the plurality

consensus of 6 D. simulans genomes is a

valid codon 5,100,067 9,148,117 2,356,814 3,795,270

Valid codon present and not masked by

RepeatMasker in each of the 12

Drosophila species 3,840,834 6,873,503 1,756,007 2,892,992

Not among the 10 codon sites on the 5’ end

of the gene, nor the 10 codon sites on

the 3’ end of the gene 3,722,688 6,669,647 1,708,006 2,790,411

Codon matches between D. yakuba and

D. erecta 3,133,805 5,628,459 1,381,877 2,391,079

Codon matches between D. melanogaster

and at least one of the non-reference

D. simulans 2,918,085 5,229,361 1,254,158 2,270,736