1
Mixed Stock Analysis of Sockeye Salmon Allows for Detection of Small Contributions with Adequate Power: Bayesian Approach versus Maximum Likelihood Anton Antonovich 1 , William Templin 1 , Christopher Habicht 1 , Joel Reynolds 1,2 , and James Seeb 1 1 Alaska Department of Fish and Game, 333 Raspberry Road, Anchorage, Alaska 99518, USA 2 U.S. Fish and Wildlife Service, 1011 E. Tudor Road, Anchorage, Alaska, 99503, USA Abstract Genetic data have proven extremely useful in identifying stock groups in complex mixtures. In this study we used eight microsatellite loci to serve as genetic markers from 51 Bristol Bay and 12 Russian collections of sockeye salmon. We investigated the Bayesian approach to Mixed Stock Analysis (MSA) and compared its detection power to the power of the Conditional Maximum Likelihood (CML) method. For this purpose a simulation study was designed to directly compare performance of the Bayesian and CML approaches. Sensitivity of both methods to detection of small contributions in a mixture was evaluated for a wide range of known mixtures. Detection was defined as a non- zero lower limit of the 90% credibility/confidence interval of the contribution estimate. We found that the Bayesian method is capable of detecting small contributions of selected stock groups with adequate statistical power. We also found that the CML method produces similar results with a higher power at low contribution levels. However, this increase in power was attributed to the limitations of the bootstrap method used to obtain confidence intervals for CML estimates, rather than to its better performance over the Bayesian approach. At moderate and high contribution levels ( 10%) both methods have maximum power of 1.0. Our results also indicate that posterior means serve as reliable proxies for the Bayesian stock contribution estimates. They are generally less biased than the CML estimates and have smaller variability. Baseline Development and Management Interests Genetic data of eight microsatellite loci from 63 baseline populations of sockeye salmon (Oncorhynchus nerka) were used in this study; 51 of them were collected in Bristol Bay drainages (see Figure 1) and 12 from Russian rivers in the Kamchatka Peninsula. These populations were further grouped into 12 reporting regions according to their genetic similarities. From a management perspective, the following two major questions stimulated this study: 1) How sensitive are the modern methods of MSA to detecting small contributions of Bristol Bay (vs. Russian) salmon in composite mixtures? 2) On a smaller scale, is it possible to detect small contributions of salmon originated from Lake Clark alone and Iliamna and Lake Clark combined (hereafter called Kvichak region) in mixtures typical for the Naknek harvest area? Thus, we concentrated our attention on the following four groups: Bristol Bay and Russian stocks to answer the first question, Lake Clark and Kvichak stocks for the second (Figure 2). Figure1. Map of the region with main Bristol Bay drainages. N Bristol Bay Kvichak River Naknek River Lake Clark Iliamna Lake Kukaklek L Nonvianuk L Naknek Lake Lake Covill e Newhalen River Lake Brooks Egegik River Ugashik River Nushagak R Shelikof Strait Kodiak Island Alaska Russia Bering Sea Bristol Bay Kamchatka Naknek special harvest area N Bristol Bay Kvichak River Naknek River Lake Clark Iliamna Lake Kukaklek L Nonvianuk L Naknek Lake Lake Covill e Newhalen River Lake Brooks Egegik River Ugashik River Nushagak R Shelikof Strait Kodiak Island Alaska Russia Bering Sea Bristol Bay Kamchatka N Bristol Bay Kvichak River Naknek River Lake Clark Iliamna Lake Kukaklek L Nonvianuk L Naknek Lake Lake Covill e Newhalen River Lake Brooks Egegik River Ugashik River Nushagak R Shelikof Strait Kodiak Island Alaska Russia Bering Sea Bristol Bay Kamchatka Alaska Russia Bering Sea Bristol Bay Kamchatka Alaska Russia Bering Sea Bristol Bay Kamchatka Naknek special harvest area Simulation Study For each region, we generated 30 random mixtures with a pre-assigned contribution of fish from that region. Seven contribution levels were considered – 1%, 2%, 3%, 4%, 5%, 10%, and 20%. The mixtures were then analyzed by both the Bayesian and Conditional Maximum Likelihood methods. Contribution estimates for a region of interest along with their 90% credibility/confidence intervals (CI) were obtained for all mixtures by both methods. In the Bayesian approach, posterior means were used as proxies for contribution estimates. To obtain confidence intervals for the CML estimates we used the symmetric percentile bootstrap method. The statistical power to detect a group of populations was defined as the proportion of times (out of 30 mixtures) that a group has been successfully detected. Detection of a group is equivalent to a hypothesis testing for a non-zero contribution: 0 : 0 vs. : 0 a H H θ θ = > . A decision to reject the null hypothesis was made if the lower limit of the 90% CI for θ was not zero when rounded to two significant digits. With the sample size of 200 fish in the mixture, this would be equivalent to having detected just a single Results Bayesian estimates are generally less biased and have slightly smaller variability than their CML counterparts (Figure 3). Relative bias tends to decrease for both methods with increase in contribution level (Figure 4). Bayesian method is capable of detecting small contributions of selected stock groups with adequate statistical power (Figure 5). Its detection power decreases progressively with the decline in contribution level. At lower contribution levels, the CML method demonstrated higher power than the Bayesian approach. At higher levels (>= 10%) both methods had maximum power of 1. The increase in power shown by the CML method at low contributions was attributed to biased bootstrap CIs. Since the stock detection was based on the lower limit of bootstrap CI, biased-high confidence intervals led to artificially high detection power. This phenomenon was especially pronounced for a group of Kvichak stocks, where power remained unusually high for all contribution levels (Figure 5). Conclusions Both Bayesian and CML approaches give similar results with Bayesian estimates being slightly less biased and variable. The symmetric bootstrap method (used in CML) can lead to the biased-high confidence intervals, thus artificially inflating the detection power. Software For CML simulations we used SPAM version 3.7 (Alaska Department of Fish and Game, 2003). For Bayesian simulations we used BAYES (Auke Bay Laboratory, Alaska Fisheries Science Center). Figure 4. Relative bias of stock contribu- tion estimates for each region. Bias was calculated as amount of departure of the average across 30 contribution estimates from the known true contribution assigned for that scenario, i.e. ˆ *100% . true true Bias θ θ θ = Bayesian Estimate CML Estmate Bayesian Estimate CML Estmate Estimation of Bias Bristol Bay True contribution 0.01 0.03 0.05 0.10 0.20 -30 -20 -10 0 10 20 30 Bias (%) Russia True contribution 0.01 0.03 0.05 0.10 0.20 0 50 100 Bias (%) Lake Clark True contribution 0.01 0.03 0.05 0.10 0.20 0 20 40 60 Bias (%) Kvichak True contribution 0.01 0.03 0.05 0.10 0.20 0 50 100 150 200 Bias (%) Figure 5. Detection power of the Bayesian and CML methods for each region. Power was defined as proportion of times a group was successfully detected in mix- tures with known composition. The dotted line is the ideal power based on the lower limit of 90% bootstrap CI for a perfectly identifiable group and a mixture of size 200. 0.0 0.2 0.4 0.6 0.8 1.0 Power 0.01 0.05 0.10 0.15 0.20 0.0 0.2 0.4 0.6 0.8 1.0 Power True contribution Bristol Bay Russia 0.0 0.2 0.4 0.6 0.8 1.0 Power 0.01 0.05 0.10 0.15 0.20 0.0 0.2 0.4 0.6 0.8 1.0 Power True contribution Lake Clark Kvichak Detection Power Bayesian approach Maximum Likelihood Ideal Power by Bootstrap Bayesian approach Maximum Likelihood Ideal Power by Bootstrap Figure 3. Average contribution estimates for each region with approximate normal 90% confidence intervals. Averaging was done across 30 mixtures generated for each scenario (i.e. region * contribution level) with pre- assigned true contribution of fish from the region of interest. Simulation Results Lake Clark 0.01 0.02 0.03 0.04 0.05 0.10 0.20 True contribution Estimated contribution 0.01 0.03 0.05 0.10 0.20 Kvichak 0.01 0.02 0.03 0.04 0.05 0.10 0.20 True contribution Estimated contribution 0.01 0.03 0.05 0.10 0.20 Bristol Bay 0.01 0.02 0.03 0.04 0.05 0.10 0.20 True contribution Estimated contribution 0.01 0.03 0.05 0.10 0.20 Russia 0.01 0.02 0.03 0.04 0.05 0.10 0.20 True contribution Estimated contribution 0.01 0.03 0.05 0.10 0.20 Bayesian Estimate CML Estimate Bayesian Estimate CML Estimate Figure 2. Mixture estimates for each region from 100% simulations with corresponding 90% bootstrap CIs. All estimates are greater than 0.9 confirming the genetic similarity within groups. Russia Bristol Bay Iliamna Lake Lake Clark 0.80 0.85 0.90 0.95 1.00 Estimated contribution Russia Bristol Bay Iliamna Lake Lake Clark 0.80 0.85 0.90 0.95 1.00 Estimated contribution Simulation results with 100% contribution from a region For these two groups simulations were carried out using truncated baseline consisting of 32 populations from Naknek and Kvichak drainages only. For these two groups simulations were carried out using the full baseline consisting of 63 populations.

Mixed stock analysis of sockeye salmon - Bayesian Approach

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mixed stock analysis of sockeye salmon - Bayesian Approach

Mixed Stock Analysis of Sockeye Salmon Allows for Detection ofSmall Contributions with Adequate Power: Bayesian Approach

versus Maximum LikelihoodAnton Antonovich1, William Templin1, Christopher Habicht1, Joel Reynolds1,2, and James Seeb1

1Alaska Department of Fish and Game, 333 Raspberry Road, Anchorage, Alaska 99518, USA2U.S. Fish and Wildlife Service, 1011 E. Tudor Road, Anchorage, Alaska, 99503, USA

AbstractGenetic data have proven extremely useful in identifying stockgroups in complex mixtures. In this study we used eightmicrosatellite loci to serve as genetic markers from 51 Bristol Bayand 12 Russian collections of sockeye salmon. We investigatedthe Bayesian approach to Mixed Stock Analysis (MSA) andcompared its detection power to the power of the ConditionalMaximum Likelihood (CML) method. For this purpose a simulationstudy was designed to directly compare performance of theBayesian and CML approaches. Sensitivity of both methods todetection of small contributions in a mixture was evaluated for awide range of known mixtures. Detection was defined as a non-zero lower limit of the 90% credibility/confidence interval of thecontribution estimate. We found that the Bayesian method iscapable of detecting small contributions of selected stock groupswith adequate statistical power. We also found that the CMLmethod produces similar results with a higher power at lowcontribution levels. However, this increase in power was attributedto the limitations of the bootstrap method used to obtain confidenceintervals for CML estimates, rather than to its better performanceover the Bayesian approach. At moderate and high contributionlevels ( 10%) both methods have maximum power of 1.0. Ourresults also indicate that posterior means serve as reliable proxiesfor the Bayesian stock contribution estimates. They are generallyless biased than the CML estimates and have smaller variability.

Baseline Development and Management Interests

Genetic data of eight microsatellite loci from 63 baselinepopulations of sockeye salmon (Oncorhynchus nerka) were used inthis study; 51 of them were collected in Bristol Bay drainages (seeFigure 1) and 12 from Russian rivers in the Kamchatka Peninsula.These populations were further grouped into 12 reporting regionsaccording to their genetic similarities.From a management perspective, the following two major questionsstimulated this study:1) How sensitive are the modern methods of MSA to detecting

small contributions of Bristol Bay (vs. Russian) salmon incomposite mixtures?

2) On a smaller scale, is it possible to detect smallcontributions of salmon originated from Lake Clark aloneand Iliamna and Lake Clark combined (hereafter calledKvichak region) in mixtures typical for the Naknek harvestarea?

Thus, we concentrated our attention on the following four groups:Bristol Bay and Russian stocks to answer the first question, LakeClark and Kvichak stocks for the second (Figure 2).

Figure1. Map of the region with main Bristol Bay drainages.

N

Bristol Bay

Kvichak River

Naknek River

Lake Clark

Iliamna Lake

Kukaklek L

Nonvianuk L

Naknek Lake

Lake Coville

NewhalenRiver

Lake BrooksEgegik River

Ugashik River

Nushagak R

Shelikof Strait

KodiakIsland

AlaskaRussia

Bering SeaBristol Bay

Kamchatka

Naknek special harvest area

N

Bristol Bay

Kvichak River

Naknek River

Lake Clark

Iliamna Lake

Kukaklek L

Nonvianuk L

Naknek Lake

Lake Coville

NewhalenRiver

Lake BrooksEgegik River

Ugashik River

Nushagak R

Shelikof Strait

KodiakIsland

AlaskaRussia

Bering SeaBristol Bay

Kamchatka

N

Bristol Bay

Kvichak River

Naknek River

Lake Clark

Iliamna Lake

Kukaklek L

Nonvianuk L

Naknek Lake

Lake Coville

NewhalenRiver

Lake BrooksEgegik River

Ugashik River

Nushagak R

Shelikof Strait

KodiakIsland

AlaskaRussia

Bering SeaBristol Bay

Kamchatka

AlaskaRussia

Bering SeaBristol Bay

Kamchatka

AlaskaRussia

Bering SeaBristol Bay

Kamchatka

Naknek special harvest area

Simulation Study• For each region, we generated 30 random mixtureswith a pre-assigned contribution of fish from thatregion. Seven contribution levels were considered –1%, 2%, 3%, 4%, 5%, 10%, and 20%. The mixtureswere then analyzed by both the Bayesian andConditional Maximum Likelihood methods.• Contribution estimates for a region of interest alongwith their 90% credibility/confidence intervals (CI)were obtained for all mixtures by both methods. In theBayesian approach, posterior means were used asproxies for contribution estimates. To obtainconfidence intervals for the CML estimates we usedthe symmetric percentile bootstrap method.• The statistical power to detect a group ofpopulations was defined as the proportion of times(out of 30 mixtures) that a group has beensuccessfully detected. Detection of a group isequivalent to a hypothesis testing for a non-zerocontribution: 0 : 0 vs. : 0aH Hθ θ= > . A decision toreject the null hypothesis was made if the lower limit ofthe 90% CI for θ was not zero when rounded to twosignificant digits. With the sample size of 200 fish inthe mixture, this would be equivalent to havingdetected just a single

Results♦ Bayesian estimates are generally less biased and have slightly smaller variability than their

CML counterparts (Figure 3). Relative bias tends to decrease for both methods withincrease in contribution level (Figure 4).

♦ Bayesian method is capable of detecting small contributions of selected stock groups withadequate statistical power (Figure 5). Its detection power decreases progressively with thedecline in contribution level.

♦ At lower contribution levels, the CML method demonstrated higher power than the Bayesianapproach. At higher levels (>= 10%) both methods had maximum power of 1.

♦ The increase in power shown by the CML method at low contributions was attributed tobiased bootstrap CIs. Since the stock detection was based on the lower limit of bootstrap CI,biased-high confidence intervals led to artificially high detection power. This phenomenonwas especially pronounced for a group of Kvichak stocks, where power remained unusuallyhigh for all contribution levels (Figure 5).

ConclusionsBoth Bayesian and CML approaches give similar results with Bayesian estimates beingslightly less biased and variable.The symmetric bootstrap method (used in CML) can lead to the biased-high confidenceintervals, thus artificially inflating the detection power.

Software• For CML simulations we used SPAM version 3.7 (Alaska Department of Fish and Game, 2003).• For Bayesian simulations we used BAYES (Auke Bay Laboratory, Alaska Fisheries Science Center).

Figure 4. Relativebias of stock contribu-tion estimates for eachregion. Bias wascalculated as amountof departure of theaverage across 30contribution estimatesfrom the known truecontribution assignedfor that scenario, i.e.

ˆ*100% .true

true

Bias θ θθ

−=

Bayesian EstimateCML EstmateBayesian EstimateCML Estmate

Estimation of BiasBristol Bay

True contribution0.01 0.03 0.05 0.10 0.20

-30

-20

-10

010

2030

Bia

s (%

)

Russia

True contribution0.01 0.03 0.05 0.10 0.20

050

100

Bia

s (%

)

Lake Clark

True contribution0.01 0.03 0.05 0.10 0.20

020

4060

Bia

s (%

)

Kvichak

True contribution0.01 0.03 0.05 0.10 0.20

050

100

150

200

Bia

s (%

)

Figure 5. Detection power of the Bayesianand CML methods for each region. Powerwas defined as proportion of times agroup was successfully detected in mix-tures with known composition. The dottedline is the ideal power based on the lowerlimit of 90% bootstrap CI for a perfectlyidentifiable group and a mixture of size200.

0.0

0.2

0.4

0.6

0.8

1.0

Pow

er

0.01 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Pow

er

True contribution

Bristol Bay

Russia

0.0

0.2

0.4

0.6

0.8

1.0

Pow

er

0.01 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Pow

er

True contribution

Lake Clark

Kvichak

Detection Power

Bayesian approachMaximum LikelihoodIdeal Power by Bootstrap

Bayesian approachMaximum LikelihoodIdeal Power by Bootstrap

Figure 3. Averagecontribution estimates foreach region withapproximate normal 90%confidence intervals.Averaging was doneacross 30 mixturesgenerated for eachscenario (i.e. region *contribution level) with pre-assigned true contributionof fish from the region ofinterest.

Simulation ResultsLake Clark

0.01 0.02 0.03 0.04 0.05 0.10 0.20True contribution

Est

imat

ed c

ontri

butio

n

0.010.030.05

0.10

0.20

Kvichak

0.01 0.02 0.03 0.04 0.05 0.10 0.20True contribution

Est

imat

ed c

ontri

butio

n

0.010.030.05

0.10

0.20

Bristol Bay

0.01 0.02 0.03 0.04 0.05 0.10 0.20True contribution

Est

imat

ed c

ontri

butio

n

0.010.030.05

0.10

0.20

Russia

0.01 0.02 0.03 0.04 0.05 0.10 0.20True contribution

Est

imat

ed c

ontri

butio

n

0.010.030.05

0.10

0.20

Bayesian EstimateCML EstimateBayesian EstimateCML Estimate

Figure 2. Mixture estimates for each region from 100% simulationswith corresponding 90% bootstrap CIs. All estimates are greaterthan 0.9 confirming the genetic similarity within groups.

Russia

Bristol Bay

Iliamna Lake

Lake Clark

0.80 0.85 0.90 0.95 1.00

Estimated contribution

Russia

Bristol Bay

Iliamna Lake

Lake Clark

0.80 0.85 0.90 0.95 1.00

Estimated contribution

Simulation results with 100% contribution from a region

For these twogroups simulationswere carried outusing truncatedbaseline consistingof 32 populationsfrom Naknek andKvichak drainagesonly.

For these twogroups simulationswere carried outusing the fullbaseline consistingof 63 populations.