9
45 Supplemental Methods 876 Herein we define a hierarchical Bayesian model to estimate genotypes g, allele frequencies 877 p, and a genetic diversity parameter θ from low-coverage DNA sequence data with possible 878 sequence errors. We describe the model for bi-allelic SNPs, but the model is easily modified 879 for multi-allelic loci. Let x ij denote the number of sequences of the arbitrarily defined 880 reference allele for locus (SNP) i and individual j . And let ǫ i denote the probability of a 881 sequence error for locus i. Let g ij ∈{0, 1, 2} denote the genotype for locus i and individual 882 j ; g ij = 1 is the heterozygous genotype, and g ij = 0 and g ij = 2 are the homozygous 883 genotypes. We assume that that conditional probability of the data for locus i and 884 individual j given g ij is binomial, 885 P (x ij |g ij )= n ij ! x ij !(1 - x ij )! (1 - ǫ) x ij ǫ n ij -x ij if g ij =0 0.5 n ij if g ij =1 (ǫ) x ij (1 - ǫ) n ij -x ij if g ij =2 (A1) where n ij is the number of sequences (i.e., sequence coverage) for locus i and individual j . 886 The full likelihood of the data is i j P (x ij |g ij )P (g ij |p i ), where p i is the population 887 frequency of the reference allele for locus i and P (g ij |p i ) binomial(p i ,n = 2). By taking 888 the product across loci and individuals we assume Hardy-Weinberg and linkage equilibrium 889 within a population. 890 We place a hierarchical prior on p that is conditional on a genetic diversity 891 parameter θ, specifically, P (p i |θ) Beta(θ, θ). When θ is large (e.g., greater than one), 892 many loci are expected to have intermediate allele frequencies. Conversely, as θ approaches 893 zero, most loci are expected to have a single common allele and one or more rare alleles. 894 Under certain conditions θ =4N e µ (Wright, 1931). We place an uninformative hyperprior 895 on θ, specifically we assume θ U(a, b) where a 0 and b is large (we use b = 10000). We 896 developed software that uses a MCMC algorithm to sample from 897

Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

45

Supplemental Methods876

Herein we define a hierarchical Bayesian model to estimate genotypes g, allele frequencies877

p, and a genetic diversity parameter θ from low-coverage DNA sequence data with possible878

sequence errors. We describe the model for bi-allelic SNPs, but the model is easily modified879

for multi-allelic loci. Let xij denote the number of sequences of the arbitrarily defined880

reference allele for locus (SNP) i and individual j. And let ǫi denote the probability of a881

sequence error for locus i. Let gij ∈ {0, 1, 2} denote the genotype for locus i and individual882

j; gij = 1 is the heterozygous genotype, and gij = 0 and gij = 2 are the homozygous883

genotypes. We assume that that conditional probability of the data for locus i and884

individual j given gij is binomial,885

P (xij|gij) =nij!

xij!(1− xij)!

(1− ǫ)xijǫnij−xij if gij = 0

0.5nij if gij = 1

(ǫ)xij(1− ǫ)nij−xij if gij = 2

(A1)

where nij is the number of sequences (i.e., sequence coverage) for locus i and individual j.886

The full likelihood of the data is∏

i

jP (xij|gij)P (gij|pi), where pi is the population887

frequency of the reference allele for locus i and P (gij|pi) ∼ binomial(pi, n = 2). By taking888

the product across loci and individuals we assume Hardy-Weinberg and linkage equilibrium889

within a population.890

We place a hierarchical prior on p that is conditional on a genetic diversity891

parameter θ, specifically, P (pi|θ) ∼ Beta(θ, θ). When θ is large (e.g., greater than one),892

many loci are expected to have intermediate allele frequencies. Conversely, as θ approaches893

zero, most loci are expected to have a single common allele and one or more rare alleles.894

Under certain conditions θ = 4Neµ (Wright, 1931). We place an uninformative hyperprior895

on θ, specifically we assume θ ∼ U(a, b) where a ≈ 0 and b is large (we use b = 10000). We896

developed software that uses a MCMC algorithm to sample from897

Page 2: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

46

P (g,p, θ|x) ∝ P (x|g)P (g|p)P (p|θ)P (θ). The software is written in C++, uses the GNU898

Scientific Library (Galassi et al., 2009), and is available through DRYAD (doi pending).899

Page 3: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

47

Supplemental Tables and Figures900

Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides ;N♂ = male sample size; N♀ = female sample size).

Locality Taxon ID N♂ N♀ Lat. (◦N) Long. (◦W) Elevation (m)

King’s Hill, MT L. idas KHL 8 0 46.8407 110.6990 2239Garnet Peak, MT L. idas GNP 10 10 45.4323 111.2245 1910Bunsen Peak, WY L. idas BNP 10 10 44.9337 110.7212 2260Trout Lake, WY L. idas TRL 15 3 44.9019 110.1291 2124Hayden Valley, WY L. idas HNV 13 25 44.6823 110.4945 2344Mt. Randolf, WY JH MRF 23 12 43.8547 110.3918 2221Upper Slide Lake, WY JH USL 17 12 43.5829 110.3328 2246Teton Science School, WY JH TSS 17 17 43.6974 110.6102 2180Blacktail Butte, WY JH BTB 17 21 43.6382 110.6820 2220Bull Creek, WY JH BCR 20 17 43.3007 110.5530 2195Victor, ID L. melissa VIC 10 15 43.6590 111.1114 1850Lander, WY L. melissa LAN 12 12 42.6533 108.3551 1787Sinclair, WY L. melissa SIN 12 13 41.8517 107.0917 1961

Supplementary Table S2: Proportion of phenotypic variance explained by population.

Trait Prop. VarianceF 0.897H 0.492U 0.665W 0.100E 0.028F/W 0.706F/H 0.729[F+H]/E 0.495H/U 0.293Num. Ast. 0.123Num. Med. 0.000Prop. Ast. 0.141

Page 4: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

48

Supplementary Table S3: The number of genetic regions with posterior inclusion probabilitiesgreater than or equal to the 99.9th empirical quantile for each pair of traits. The expectednumber of shared genetic regions if posterior inclusion probabilities for pairs of traits areindependent is less than one (approximately 1

20). Genitalic measurements F, H, U, W, E,

F/W, F/H, [F+H]/E, and H/U are depicted in Figure 2, and oviposition traits are thenumber or proportion of eggs laid on Medicago or Astragalus. These results are for the naiveanalysis that includes all Lycaeides populations (LN analysis).

[F+H] Num. Num. Prop.F H U W E F/W F/H /E H/U Ast. Med. Ast.

F 51 3 12 2 0 7 11 3 1 0 0 1H 3 52 6 2 0 5 1 4 1 0 0 0U 12 6 52 2 0 7 6 4 6 1 0 0W 2 2 2 52 2 19 2 2 2 0 0 0E 0 0 0 2 53 0 0 18 0 0 0 1F/W 7 5 7 19 0 52 4 3 2 0 0 0F/H 11 1 6 2 0 4 52 3 3 0 0 0[F+H]/E 3 4 4 2 18 3 3 52 1 0 0 0H/U 1 1 6 2 0 2 3 1 54 0 0 0Num. Ast. 0 0 1 0 0 0 0 0 0 53 1 1Num. Med. 0 0 0 0 0 0 0 0 0 1 52 1Prop. Ast. 1 0 0 0 1 0 0 0 0 1 1 52

Supplementary Table S4: The number of genetic regions with posterior inclusion probabilitiesgreater than or equal to the 99.9th empirical quantile for each pair of traits. The expectednumber of shared genetic regions if posterior inclusion probabilities for pairs of traits areindependent is less than one (approximately 1

20). Genitalic measurements F, H, U, W, E,

F/W, F/H, [F+H]/E, and H/U are depicted in Figure 2, and oviposition traits are thenumber or proportion of eggs laid on Medicago or Astragalus. These results are for the naiveanalysis that includes only admixed Lycaeides populations (AN analysis).

[F+H] Num. Num. Prop.F H U W E F/W F/H /E H/U Ast. Med. Ast.

F 57 3 5 2 2 2 3 3 3 0 3 1H 3 53 2 0 0 0 4 1 1 0 0 0U 5 2 54 3 4 2 1 3 5 0 0 3W 2 0 3 52 0 18 0 0 1 0 1 1E 2 0 4 0 54 1 2 17 1 1 1 0F/W 2 0 2 18 1 52 1 1 1 0 0 0F/H 3 4 1 0 2 1 53 3 3 0 2 1[F+H]/E 3 1 3 0 17 1 3 55 3 1 1 0H/U 3 1 5 1 1 1 3 3 55 0 0 2Num. Ast. 0 0 0 0 1 0 0 1 0 53 1 1Num. Med. 3 0 0 1 1 0 2 1 0 1 54 4Prop. Ast. 1 0 3 1 0 0 1 0 2 1 4 52

Page 5: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

49

Supplementary Figure S1: Histograms summarize the variation for each morphological trait(diagonal) and scatter-plots depict the covariance between pairs of characters (off-diagonal;light gray = L. idas, gray = Jackson Hole Lycaeides, black = L. melissa). We denoteindividuals from each conspecific population with a different symbol. We report Pearson’sproduct-moment correlation in the lower-triangle plots.

Page 6: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

50

Supplementary Figure S2: Histograms summarize the variation for each oviposition pref-erence trait (diagonal) and scatter-plots depict the covariance between pairs of characters(off-diagonal; light gray = L. idas, gray = Jackson Hole Lycaeides, black = L. melissa).We denote individuals from each conspecific population with a different symbol. We reportPearson’s product-moment correlation in the lower-triangle plots.

Page 7: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

51

Nu

mb

er

of

loci

0.0 0.5 1.0

020000

40000

SIN

0.0 0.5 1.0

020000

40000

LAN

0.0 0.5 1.0

020000

40000

VIC

Nu

mb

er

of

loci

0.0 0.5 1.0

020000

40000

BTB

0.0 0.5 1.0

020000

40000

TSS

0.0 0.5 1.0

020000

40000

USL

Num

ber

of

loci

0.0 0.5 1.0

020000

40000

MRF

0.0 0.5 1.0

020000

40000

HNV

0.0 0.5 1.0

020000

40000

TRL

Allele frequency

Num

ber

of

loci

0.0 0.5 1.0

020000

40000

BNP

Allele frequency

0.0 0.5 1.0

020000

40000

GNP

Allele frequency

0.0 0.5 1.0

020000

40000

KHL

Supplementary Figure S3: Histograms depict the reference allele frequency distribution forall loci and each population. We define population abbreviations in Table S1.

Page 8: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

52

Nu

mb

er

of

loci

0.0 0.1 0.2 0.3 0.4 0.5

05

10

15

20

25

A

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

B

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

CN

um

ber

of lo

ci

0.0 0.1 0.2 0.3 0.4 0.5

010

20

30

40

D

0.0 0.1 0.2 0.3 0.4 0.5

01

23

45

E

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.5

1.0

1.5

2.0

F

Effect size

Num

ber

of

loci

0.0 0.1 0.2 0.3 0.4 0.5

02

46

8

G

Effect size

0.0 0.1 0.2 0.3 0.4 0.5

01

23

4

H

Effect size

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

I

Supplementary Figure S4: Histograms depict estimated effect sizes for SNPs with posteriorinclusion probabilities greater than 0.01. (A) F, all Lycaeides, naive (LN) analysis; (B) F,admixed populations, naive (AN) analysis; (C) F, all Lycaeides, population-mean adjusted(LR) analysis; (D) H, all Lycaeides, naive (LN) analysis; (E) H, admixed populations, naive(AN) analysis; (F) H, all Lycaeides, population-mean adjusted (LR) analysis; (G) propor-tion of eggs on Astragalus, all Lycaeides, naive (LN) analysis; (H) proportion of eggs onAstragalus, admixed populations, naive (AN) analysis; (I) proportion of eggs on Astragalus,all Lycaeides, population-mean adjusted (LR) analysis.

Page 9: Supplemental Methods · 2019. 10. 11. · 47 900 Supplemental Tables and Figures Supplementary Table S1: Population sample information (JH = Jackson Hole Lycaeides; N♂ = male sample

53

Supplementary Figure S5: Plots depict genetic region posterior inclusion probabilities forforearm length (A-B), humerelus length (C), and the proportion of eggs laid on Astragalus

(D). We use different symbols to designate different analyses: all Lycaeides, naive analysis(LN analysis; small, closed circle); admixed populations, naive analysis (AN analysis; +); allLycaeides, population-mean adjusted analysis (LR analysis; ×). The order of genetic regionsis arbitrary, but consistent among plots. The scale of the y-axis differs among plots. Wepresent posterior inclusion probabilities for forearm length in two panes, because the scalediffers considerably among the different analyses.