Upload
julio-sierra-camarena
View
228
Download
0
Embed Size (px)
Citation preview
7/31/2019 Tutorial of STRUCTURE Software
1/29
Tutorial of the STRUCTURE
software
Dr. Sung-Chur Sim
Tomato Genetics and Breeding programThe Ohio State Univ., OARDC
7/31/2019 Tutorial of STRUCTURE Software
2/29
STRUCTURE software
A model-based clustering method (Pritchard et al. 2000)
Free software
(http://pritch.bsd.uchicago.edu/software/structure2_1.html)
Bayesian approach (MCMC: Markov Chain Monte Carlo) Detects the underlying genetic population among a set of
individuals genotyped at multiple markers
Computes the proportion of the genome of an individual
originating from each inferred population (quantitativeclustering method)
http://pritch.bsd.uchicago.edu/software/structure2_1.htmlhttp://pritch.bsd.uchicago.edu/software/structure2_1.html7/31/2019 Tutorial of STRUCTURE Software
3/29
Input data
A matrix where the data for individuals are in rows, the
loci are in column
n consecutive rows have the data for each individual ofn-
ploid species Integershould be used for coding genotype
Missing data should be indicated by a numberwhich doesnt
occur elsewhere in the data (e.g. -1)
The data file should be a text file (.txt)not an excel file (.xls)for running STRUCTURE
7/31/2019 Tutorial of STRUCTURE Software
4/29
Information of user-defined populations (market class)
Missing data
2 consecutive rows
for alleles
7/31/2019 Tutorial of STRUCTURE Software
5/29
Running STRUCTURE from a graphical
interface, Front End
The Front End organizes dataanalysis into project
7/31/2019 Tutorial of STRUCTURE Software
6/29
Importing input data into a project
7/31/2019 Tutorial of STRUCTURE Software
7/29
Importing input data into a project (cont.)
7/31/2019 Tutorial of STRUCTURE Software
8/29
Importing input data into a project (cont.)
7/31/2019 Tutorial of STRUCTURE Software
9/29
Importing input data into a project (cont.)
7/31/2019 Tutorial of STRUCTURE Software
10/29
Importing input data into a project (cont.)
7/31/2019 Tutorial of STRUCTURE Software
11/29
Importing input data into a project (cont.)
7/31/2019 Tutorial of STRUCTURE Software
12/29
Configuring a parameter set
7/31/2019 Tutorial of STRUCTURE Software
13/29
Length of Burnin Period: how long to run the simulation before collecting data to minimizethe effect of the starting configuration
Number of MCMC Reps after Burnin: how long to run the simulation after burnin to get
accurate parameter estimates
Configuring a parameter set (cont.)
7/31/2019 Tutorial of STRUCTURE Software
14/29
Configuring a parameter set (cont.)
7/31/2019 Tutorial of STRUCTURE Software
15/29
Configuring a parameter set (cont.)
7/31/2019 Tutorial of STRUCTURE Software
16/29
Configuring a parameter set (cont.)
7/31/2019 Tutorial of STRUCTURE Software
17/29
Configuring a parameter set (cont.)
7/31/2019 Tutorial of STRUCTURE Software
18/29
Running STRUCTURE: a single run
7/31/2019 Tutorial of STRUCTURE Software
19/29
Running STRUCTURE: a single run (cont.)
7/31/2019 Tutorial of STRUCTURE Software
20/29
Running STRUCTURE: a batch run
7/31/2019 Tutorial of STRUCTURE Software
21/29
Running STRUCTURE: a batch run (cont.)
7/31/2019 Tutorial of STRUCTURE Software
22/29
Ln P(D): Estimated probability of Ks
7/31/2019 Tutorial of STRUCTURE Software
23/29
Inference of true K(number of populations)
The log likelihood for each K, Ln P(D) = L(K)
Two approaches to determine the best K
1. Use of L(K): When K is approaching a true value,L(K) plateaus (or continues increasing slightly) andhas high variance between runs (Rosenberg et al.2001).
Nonparametric test (Wilcoxon test)
2. Use of an ad hoc quantity (K): Calculatedbased on the second order rate of change of thelikelihood (K) (Evanno et al. 2005). The K showsa clear peak at the true value of K.
K = m([LK])/s[L(K)]
Evanno et al. 2005. Molecular Ecology 14: 2611-2620
7/31/2019 Tutorial of STRUCTURE Software
24/29
SAS code for the nonparametric method
7/31/2019 Tutorial of STRUCTURE Software
25/29
Inference of best K using the delta K method
The best K = 8
L(K) = an average of 20 values of Ln P(D)
L(K) = L(K)n L(K)n-1L(K) = L(K)nL(K)n-1
Delta K = [L(K)]/Stdev
7/31/2019 Tutorial of STRUCTURE Software
26/29
Q-matrix
7/31/2019 Tutorial of STRUCTURE Software
27/29
Format the marker data
Run STRUCTURE w/10K for burnin and 50K for MCMC reps
20 times at each of K=1 to 10
Infer true K (5~7)
Run STRUCTURE w/500K for burnin and 750K for MCMC
reps 20 times at each of K=3 to 8
Identify the best K based on L(K) and K
An example of steps to identify the best K
7/31/2019 Tutorial of STRUCTURE Software
28/29
We may not always be able to know the TRUE value of
K, but we should aim for the smallest value of K that
captures the major structure in the dataPritchard et al. (2000)
7/31/2019 Tutorial of STRUCTURE Software
29/29
Enjoy running STRUCTURE