12
Copyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386 Bayesian Model Selection for Genome-Wide Epistatic Quantitative Trait Loci Analysis Nengjun Yi,* ,†,1 Brian S. Yandell, Gary A. Churchill, § David B. Allison,* ,† Eugene J. Eisen** and Daniel Pomp †† *Department of Biostatistics, Section on Statistical Genetics, Clinical Nutrition Research Center, University of Alabama, Birmingham, Alabama 35294, Departments of Statistics and Horticulture, University of Wisconsin, Madison, Wisconsin 53706, § The Jackson Laboratory, Bar Harbor, Maine 04609, **Department of Animal Science, North Carolina State University, Raleigh, North Carolina 27695 and †† Department of Animal Science, University of Nebraska, Lincoln, Nebraska 68583 Manuscript received December 29, 2004 Accepted for publication April 4, 2005 ABSTRACT The problem of identifying complex epistatic quantitative trait loci (QTL) across the entire genome continues to be a formidable challenge for geneticists. The complexity of genome-wide epistatic analysis results mainly from the number of QTL being unknown and the number of possible epistatic effects being huge. In this article, we use a composite model space approach to develop a Bayesian model selection framework for identifying epistatic QTL for complex traits in experimental crosses from two inbred lines. By placing a liberal constraint on the upper bound of the number of detectable QTL we restrict attention to models of fixed dimension, greatly simplifying calculations. Indicators specify which main and epistatic effects of putative QTL are included. We detail how to use prior knowledge to bound the number of detectable QTL and to specify prior distributions for indicators of genetic effects. We develop a computa- tionally efficient Markov chain Monte Carlo (MCMC) algorithm using the Gibbs sampler and Metropolis- Hastings algorithm to explore the posterior distribution. We illustrate the proposed method by detecting new epistatic QTL for obesity in a backcross of CAST/Ei mice onto M16i. M ANY complex human diseases and traits of bio- tive corrections for multiple testing. Non-Bayesian model logical and/or economic importance are deter- selection methods combine simultaneous search with a mined by multiple genetic and environmental influ- sequential procedure such as forward or stepwise selec- ences (Lynch and Walsh 1998). Mounting evidence tion and apply criteria such as P -values or modified Baye- suggests that interactions among genes (epistasis) play sian information criterion (BIC) to identify well-fitting an important role in the genetic control and evolu- multiple-QTL models (Kao et al. 1999; Carlborg et al. tion of complex traits (Cheverud 2000; Carlborg and 2000; Reifsnyder et al. 2000; Bogdan et al. 2004). These Haley 2004). Mapping quantitative trait loci (QTL) is methods, although appealing in their simplicity and pop- a process of inferring the number of QTL, their geno- ularity, have several drawbacks, including: (1) the uncer- mic positions, and genetic effects given observed pheno- tainty about the model itself is ignored in the final in- type and marker genotype data. From a statistical per- ference, (2) they involve a complex sequential testing spective, two key problems in QTL mapping are model strategy that includes a dynamically changing null hy- search and selection (e.g., Broman and Speed 2002; pothesis, and (3) the selection procedure is heavily in- Sillanpa ¨a ¨ and Corander 2002; Yi 2004). Traditional fluenced by the quantity of data (Raftery et al. 1997; QTL mapping methods utilize a statistical model, which George 2000; Gelman et al. 2004; Kadane and Lazar estimates the effects of only one QTL whose putative 2004). position is scanned across the genome (e.g., Lander and Bayesian model selection methods provide a power- Botstein 1989; Jansen and Stam 1994; Zeng 1994). ful and conceptually simple approach to mapping multi- Extensions of this approach can allow for main and epi- ple QTL (Satagopan et al. 1996; Hoeschele 2001; Sen static effects at two or perhaps a few QTL at a time and and Churchill 2001). The Bayesian approach pro- employ a multidimensional scan to detect QTL. How- ceeds by setting up a likelihood function for the pheno- ever, such an approach neglects potential confound- type and assigning prior distributions to all unknowns ing effects from additional QTL and requires prohibi- in the problem. These induce a posterior distribution on the unknown quantities that contains all of the avail- able information for inference of the genetic architec- 1 Corresponding author: Department of Biostatistics, University of Ala- ture of the trait. Bayesian mapping methods can treat bama, Ryals Public Health Bldg., 1665 University Blvd., Birmingham, AL 35294-0022. E-mail: [email protected] the unknown number of QTL as a random variable, Genetics 170: 1333–1344 ( July 2005)

Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

Copyright © 2005 by the Genetics Society of AmericaDOI: 10.1534/genetics.104.040386

Bayesian Model Selection for Genome-Wide Epistatic QuantitativeTrait Loci Analysis

Nengjun Yi,*,†,1 Brian S. Yandell,‡ Gary A. Churchill,§ David B. Allison,*,†

Eugene J. Eisen** and Daniel Pomp††

*Department of Biostatistics, Section on Statistical Genetics, †Clinical Nutrition Research Center, University of Alabama, Birmingham,Alabama 35294, ‡Departments of Statistics and Horticulture, University of Wisconsin, Madison, Wisconsin 53706,

§The Jackson Laboratory, Bar Harbor, Maine 04609, **Department of Animal Science, North CarolinaState University, Raleigh, North Carolina 27695 and ††Department of Animal Science,

University of Nebraska, Lincoln, Nebraska 68583

Manuscript received December 29, 2004Accepted for publication April 4, 2005

ABSTRACTThe problem of identifying complex epistatic quantitative trait loci (QTL) across the entire genome

continues to be a formidable challenge for geneticists. The complexity of genome-wide epistatic analysisresults mainly from the number of QTL being unknown and the number of possible epistatic effects beinghuge. In this article, we use a composite model space approach to develop a Bayesian model selectionframework for identifying epistatic QTL for complex traits in experimental crosses from two inbred lines.By placing a liberal constraint on the upper bound of the number of detectable QTL we restrict attentionto models of fixed dimension, greatly simplifying calculations. Indicators specify which main and epistaticeffects of putative QTL are included. We detail how to use prior knowledge to bound the number ofdetectable QTL and to specify prior distributions for indicators of genetic effects. We develop a computa-tionally efficient Markov chain Monte Carlo (MCMC) algorithm using the Gibbs sampler and Metropolis-Hastings algorithm to explore the posterior distribution. We illustrate the proposed method by detectingnew epistatic QTL for obesity in a backcross of CAST/Ei mice onto M16i.

MANY complex human diseases and traits of bio- tive corrections for multiple testing. Non-Bayesian modellogical and/or economic importance are deter- selection methods combine simultaneous search with a

mined by multiple genetic and environmental influ- sequential procedure such as forward or stepwise selec-ences (Lynch and Walsh 1998). Mounting evidence tion and apply criteria such as P -values or modified Baye-suggests that interactions among genes (epistasis) play sian information criterion (BIC) to identify well-fittingan important role in the genetic control and evolu- multiple-QTL models (Kao et al. 1999; Carlborg et al.tion of complex traits (Cheverud 2000; Carlborg and 2000; Reifsnyder et al. 2000; Bogdan et al. 2004). TheseHaley 2004). Mapping quantitative trait loci (QTL) is methods, although appealing in their simplicity and pop-a process of inferring the number of QTL, their geno- ularity, have several drawbacks, including: (1) the uncer-mic positions, and genetic effects given observed pheno- tainty about the model itself is ignored in the final in-type and marker genotype data. From a statistical per- ference, (2) they involve a complex sequential testingspective, two key problems in QTL mapping are model strategy that includes a dynamically changing null hy-search and selection (e.g., Broman and Speed 2002; pothesis, and (3) the selection procedure is heavily in-Sillanpaa and Corander 2002; Yi 2004). Traditional fluenced by the quantity of data (Raftery et al. 1997;QTL mapping methods utilize a statistical model, which George 2000; Gelman et al. 2004; Kadane and Lazarestimates the effects of only one QTL whose putative 2004).position is scanned across the genome (e.g., Lander and Bayesian model selection methods provide a power-Botstein 1989; Jansen and Stam 1994; Zeng 1994). ful and conceptually simple approach to mapping multi-Extensions of this approach can allow for main and epi- ple QTL (Satagopan et al. 1996; Hoeschele 2001; Senstatic effects at two or perhaps a few QTL at a time and and Churchill 2001). The Bayesian approach pro-employ a multidimensional scan to detect QTL. How- ceeds by setting up a likelihood function for the pheno-ever, such an approach neglects potential confound- type and assigning prior distributions to all unknownsing effects from additional QTL and requires prohibi- in the problem. These induce a posterior distribution

on the unknown quantities that contains all of the avail-able information for inference of the genetic architec-

1Corresponding author: Department of Biostatistics, University of Ala- ture of the trait. Bayesian mapping methods can treatbama, Ryals Public Health Bldg., 1665 University Blvd., Birmingham,AL 35294-0022. E-mail: [email protected] the unknown number of QTL as a random variable,

Genetics 170: 1333–1344 ( July 2005)

Page 2: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1334 N. Yi et al.

A BAYESIAN MODEL SELECTION FRAMEWORKwhich has several advantages but results in the complica-FOR QTL MAPPINGtion of varying the dimension of the model space. The

reversible jump Markov chain Monte Carlo (MCMC) We consider experimental crosses derived from twoalgorithm, introduced by Green (1995), offers a power- inbred lines. In QTL studies, the observed data consistful and general approach to exploring posterior distri- of phenotypic trait values, y, and marker genotypes, m,butions in this setting. However, the ability to “move” for individuals in a mapping population. We assume thatbetween models of different dimension requires a care- markers are organized into a linkage map and restrictful construction of proposal distributions. Despite the attention to models with, at most, pairwise interactions.challenges of implementation of reversible jump algo- We partition the entire genome into H loci, � � {�1 ,rithms, effective approaches for mapping multiple non- . . . , �H }, and assume that the possible QTL occur atinteracting QTL have been developed (Satagopan and these fixed positions. This introduces only a minor biasYandell 1996; Heath 1997; Thomas et al. 1997; Uimari in estimating the position of QTL when H is large. Whenand Hoeschele 1997; Sillanpaa and Arjas 1998; Ste- the markers are densely and regularly spaced, we set �phens and Fisch 1998; Yi and Xu 2000; Gaffney 2001). to the marker positions; otherwise, � includes not onlyBayesian model selection methods employing the re- the marker positions but also points between markers.versible jump MCMC algorithm have been proposed to In general, the genotypes, g , at loci � are unobservablemap epistatic QTL in inbred line crosses and outbred pop- except at completely informative markers, but theirulations (Yi and Xu 2002; Yi et al. 2003, 2004a,b; Narita probability distribution, p(g |� , m), can be inferred fromand Sasaki 2004). However, the complexity of the reversi- the observed marker data using the multipoint methodble jump steps increases computational demand and may (Jiang and Zeng 1997). This probability distribution isprohibit improvements of the algorithms. used as the prior distribution of QTL genotypes in our

Recently, Yi (2004) proposed a unified Bayesian Bayesian framework.model selection framework to identify multiple nonepi- The problem of inferring the number and locationsstatic QTL for complex traits in experimental designs, of multiple QTL is equivalent to the problem of select-based upon a composite space representation of the ing a subset of � that fully explains the phenotypic varia-problem. The composite space approach, which is a tion. Although a complex trait may be influenced bymodification of the product space concept developed multitudes of loci, our emphasis is on a set of at mostby Carlin and Chib (1995), provides an interesting L QTL with detectable effects. Typically L will be muchviewpoint on a wide variety of model selection prob- smaller than H. Let � � {�1 , . . . , �L } (�{�1 , . . . , �H })lems (Godsill 2001). The key feature of the composite be the current positions of L putative QTL. Each locusmodel space is that the dimension remains fixed, may affect the trait through its marginal (main) effectsallowing for MCMC simulation to be performed on a and/or interactions with other loci (epistasis). The phe-space of fixed dimension, thus avoiding the complexi- notype distribution is assumed to follow a linear model,ties of reversible jump. In Yi (2004), the varying dimen-

y � � � X� � e, (1)sional space is augmented to a fixed dimensional space(the composite model space) by placing an upper bound where � is the overall mean, � denotes the vector ofon the number of detectable QTL. In the composite all possible main effects and pairwise interactions of Lmodel space, latent binary variables indicate whether potential QTL, X is the design matrix, and e is the vec-each putative QTL has a nonzero effect. The result- tor of independent normal errors with mean zero anding hierarchical model can vastly simplify the MCMC variance � 2. The number of genetic effects depends onsearch strategy. the experimental design, and the design matrix X is

In this work we extend the composite model space determined from those genotypes g at the current lociapproach to include epistatic effects. We develop a frame- � by using a particular genetic model (see appendix awork of Bayesian model selection for mapping epistatic for details of the Cockerham genetic model used here).QTL in experimental crosses from two inbred lines. We There is prior uncertainty about which genetic effectsshow how to incorporate prior knowledge to select an should be included in the model. As in Bayesian vari-upper bound on the number of detectable QTL and able selection for linear regression (e.g., George andprior distributions for indicator variables of genetic ef- McCulloch 1997; Kuo and Mallick 1998; Chipmanfects and other parameters. A computationally efficient et al. 2001), we introduce a binary variable � for eachMCMC algorithm using a Gibbs sampler or Metropolis- effect, indicating that the corresponding effect is in-Hastings (M-H) algorithm is developed to explore the cluded (� � 1) or excluded (� � 0) from a model.posterior distribution on the parameters. The proposed Letting � � diag(�), the model becomesalgorithm is easy to implement and allows more com-

y � � � X�� � e . (2)plete and rapid exploration of the model space. We firstdescribe the implementation of this algorithm and then This linear model defines the likelihood, p(y |� , X , �),illustrate the method by analyzing a mouse backcross with � � (� , �, � 2), and the full posterior can be writ-

ten aspopulation.

Page 3: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1335Bayesian Analysis of Genome-Wide Epistasis

p(� , � , g , � |y, m) � p(y |� , X , �) p(� , � , g , � |m). wm and w e , it may be better to first determine the priorexpected numbers of main-effect QTL, lm, and all QTL,(3)l 0 l m (i.e., main-effect and epistatic QTL), and then

Specifications of priors p(� , � , g , � |m) and posterior solve for wm and w e from the expressions of the prior ex-calculation are given in subsequent sections. pected numbers. It is reasonable to require that wm we ,

The vector � determines the number of QTL (see which requires some adjustment below when l m � 0.appendix b). Hereafter, we denote the included po- As shown in appendix b, the prior expected numbersitions of QTL by ��. The vector (� , ��) comprises a of main-effect QTL can be expressed asmodel index that identifies the genetic architecture of

l m � L[1 (1 wm)K], (5)the trait. A natural model selection strategy is to choosethe most probable model (� , ��) on the basis of its

and the prior expected number of all QTL asmarginal posterior, p(� , �� |y, m) (George and Foster2000). For genome-wide epistatic analysis, however, no l 0 � L[1 (1 wm)K(1 w e)K 2(L1)], (6)single model may stand out, and thus we average over

where K is the number of possible main effects for eachpossible models when assessing characteristics of ge-QTL and K 2 is the number of possible epistatic effectsnetic architecture, with the various models weighted byfor any two QTL.their posterior probability (Raftery et al. 1997; Ball

The prior expected number of main-effect QTL, l m,2001; Sillanpaa and Corander 2002).could be set to the number of QTL detected by tra-ditional nonepistatic mapping methods, e.g., intervalmapping or composite interval mapping (Lander andPRIOR DISTRIBUTIONSBotstein 1989; Zeng 1994). The prior expected num-

The above Bayesian model selection framework pro- ber of all QTL, l 0 , should be chosen to be at least l m.vides a conceptually simple and general method to iden- The number of QTL detected by traditional epistatictify complex epistatic QTL across the entire genome. mapping methods, e.g., two-dimensional genome scan,However, its practical implementation entails two chal- could provide a rough guide for choosing l 0 . Fromlenges: prior specification and posterior calculation. In Equations 5 and 6, we obtainthis section, we first propose a method to choose anupper bound for the number of QTL and then describe

wm � 1 �1 l m

L �1/K

(7)the prior specifications for the model index and otherunknowns.

andChoice of the upper bound L : We suggest first speci-fying the prior expected number of QTL, l 0 , on the

w e � 1 �1 (l 0/L)

(1 wm)K �1/K 2(L1)

. (8)basis of initial investigations with traditional methods,and then determining a reasonably large upper bound,

We note above that if no main-effect QTL is detectedL . We assign the prior probability distribution for theby traditional nonepistatic mapping methods and l m �number of QTL, l , to be a Poisson distribution with0, then wm � 0. In this case, we suggest making allmean l 0 . The value of L can be selected to be largeweights equal, wm � w e �

� w , and using (6) to obtainenough that the probability Pr(l � L) is very small. Onthe basis of a normal approximation to the Poissondistribution, we could take L as l 0 � 3√l 0 . w � 1 �1

l 0

L �1/(K�K 2(L1))

. (9)Prior on � : For the indicator vector � , we use an

independence prior of the form Prior on �: When there is no prior information con-cerning QTL locations, these could be assumed to bep(�) � �w�jj (1 wj)1�j , (4)independent and uniformly distributed over the H pos-sible loci. Thus, given l 0 the prior probability that anywhere wj � p(�j � 1) is the prior inclusion probabilitylocus is included becomes l 0/H . In practice, it may befor the j th effect. We assume that wj equals the predeter-reasonable to assume that any intervals of a given lengthmined hyperparameter wm or w e , depending on the j th(e.g., 10 cM) contain at most one QTL. Although this as-effect being main effect or epistatic effect, respectively.sumption is not necessary, it can substantially reduce theUnder this prior, the importance of any effect is inde-model space and thus accelerate the search procedure.pendent of the importance of any other effect and the

Prior on �: We propose the following hierarchicalprior inclusion probability of main effect is differentmixture prior for each genetic effect,from that of epistatic effect.

The hyperparameters wm and w e control the expected�j |(�j , � 2, x• j) � N(0, �j c� 2(xT

• j x• j)1), (10)numbers of main and epistatic effects included in themodel, respectively; small wm and w e would concentrate where x• j � (x 1j , . . . , xn j)T is the vector of the coefficients

of �j , and c is a positive scale factor. Many suggestionsthe priors on parsimonious models with few main ef-fects and epistatic effects. Instead of directly specifying have been proposed for choice of c for variable selec-

Page 4: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1336 N. Yi et al.

tion problems of linear regression (e.g., Chipman et al. p(��, g � , �� |�, y) � p(y |�, X � , ��)p(��, g � , �� |�),2001; Fernandez et al. 2001). In this study, we take c �

(14)n , which is a popular choice and yields the BIC if theprior inclusion probability for each effect equals 0.5 p(�� , g� , �� |�, y) � p(�� , g� , �� |�), (15)(e.g., George and Foster 2000; Chipman et al. 2001).

andIn this prior setup, a point mass prior at 0 is used forthe genetic effect �j when �j � 0, effectively removing p(� |�, g , �, y) � p(y |�, X � , ��)p(�)p(�� , g � , ��|�)�j from the model. If �j � 1, the prior variances reflect

p(�� , g� , �� |�). (16)the precision of each �j and are invariant to scaleschanges in the phenotype and the coefficients. The

It can be seen that the unused parameters do not affectvalue (xT• j x • j)1 varies for different types of genetic ef-

the conditional posterior of (�� , g � , ��) and thus dofects. For a large backcross population with no segrega-not need to be updated conditional on �. Since thetion distortion, for example, (xT

• j x • j)1/n � 1⁄4 for mar-unused parameters do not contribute to the likelihood,ginal effects and [1 (1 2r)2]/16 for epistatic effects,the posterior of (�� , g� , ��) is identical to its prior.with r the recombination fraction between two QTL,From (16), the conditional posterior of � depends onunder Cockerham’s model (Zeng et al. 2000).(�� , g� , ��) and thus the update of � requires genera-Priors on � and 2: The prior for the overall meantion of the corresponding unused parameters in the� is N(�0 , � 2

0). We could empirically setcurrent model. These properties lead us to developMCMC algorithms as described below. We first briefly�0 � y �

1n�

n

i�1

yi and � 20 � s 2

y �1

n 1�n

i�1

(yi y)2.describe the algorithms for updating �� , g� , and �� andthen develop a novel Gibbs sampler and Metropolis-

We take the noninformative prior for the residual vari- Hastings algorithm to update the indicator variables forance, p(� 2) � 1/� 2 (Gelman et al. 2004). Although this main and epistatic effects, respectively.prior is improper, it yields a proper posterior distri-

Conditional on �, X � , and �� , the parameters � , � 2,bution for the unknowns and so can be used formally

and �� can be sampled directly from their posterior(Chipman et al. 2001).

distributions, which have standard form (Gelman et al.2004). Conditional on �, �� , and �� , the posterior distri-bution of each element of g � is multinomial and thusMARKOV CHAIN MONTE CARLO ALGORITHMcan be sampled directly as well (Yi and Xu 2002). We

To develop our MCMC algorithm, we first partition adapt the algorithm of Yi et al. (2003) to our model tothe vector of unknowns (�, g, �) into (� � , g � , �� ) and update locations ��: (1) � is restricted to the discrete(��, g� , �� ), representing the unknowns included space � � {�1 , . . . , � H }, and (2) any intervals of someor excluded from the model, respectively, where � � and length � include at most one QTL. To update �q , there-g � (�� and g�) are the positions and the genotypes fore, we propose a new location �*q for the q th QTLof QTL included (excluded), respectively, � � (��) rep- uniformly from 2d most flanking loci of �q , where d isresent the genetic effects included (excluded), � � (�,

a predetermined integer (e.g., d � 2), and then generate� , � 2), �� � (� �, � , � 2), and �� � �� . Similarly, X � genotypes at the new location for all individuals. The(X�) represent the model coefficients included (ex-

proposals for the new location and the genotypes arecluded), which are determined by g and �.

then jointly accepted or rejected using the Metropolis-We suppress the dependence on the observed markerHastings algorithm.data below. For a particular � the likelihood function

At each iteration of the MCMC simulation, we updatedepends only upon the parameters (X � , ��) used by thatall elements of � in some fixed or random order. Formodel, i.e.,the indicator variable of a main effect, we need to con-

p(y |�, X , �) � p(y |�, X � , ��). (11) sider two different cases: a QTL is currently (1) in or(2) out of the model. For (1), the QTL position and

The prior distribution of (�, � , g, �) can be partitioned asgenotypes were generated at the preceding iteration.For (2), we sample a new QTL position from its priorp(�, �, g , �) � p(�)p(�� , g� , �� |�)p(�� , g� , �� |�).distribution and generate its genotypes for all individu-(12)als. An epistatic effect involves two QTL, hence three

The full posterior distribution for (�, �, g , �) can now different cases: (1) both QTL are in, (2) only one QTLbe expressed as is in, and (3) both QTL are out of the model. Again,

the new QTL position(s) and genotypes are sampled asp(�, �, g , � |y) � p(y |�, X � , ��)p(�)p(�� , g� , �� |�)needed.

p(�� , g� , �� |�). (13)We update �j , the indicator variable for an effect,

using its conditional posterior distribution of �j , whichFrom (13), we can derive the conditional posterior dis-tributions is Bernoulli,

Page 5: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1337Bayesian Analysis of Genome-Wide Epistasis

p(�j � 1|��j, X , ��j

, y) � 1 p(�j � 0|��j, X , ��j

, y) p(�h |y) �1N �

N

t�1�L

q�1

1(�(t )q � �h , �(t )

q � 1), h � 0, 1, . . . , H ,

(19)�wR

(1 w) � wR,

(17)where �q is the binary indicator that QTL q is included

where or excluded from the model. Thus, we can obtain thecumulative distribution function per chromosome, de-

R �p(y |�j � 1, ��j

, X , ��j)

p(y |�j � 0, ��j, X , ��j

)� ��2

�j� �2�n

i�1x 2i j

�2�j

�0.5

fined as Fc(x |y) � � x�h�0p(�h |y) for any position x on chro-

mosome c . It is worth noting that the cumulative distribu-tion function defined here can be �1 if the corresponding

exp �12

(�ni�1xij(yi � xi·� � xij �j )�2)2

�2�j

� �2�ni�1x 2

i j�, chromosome contains more than one QTL. Both p(�h |y)

and Fc(x |y) can be graphically displayed and show evi-xi • is the vector of the coefficients of � for the i th individ- dence of QTL activity across the whole genome. Com-ual, w � pr(�j � 1) is the prior probability that �j appears monly used summaries include the posterior probabil-in the model, � 2

�jis the prior variance of �j (see Equation ity that a chromosomal region contains QTL, the most

likely position of QTL (the mode of QTL positions),10), ��jmeans all the elements of � except for �j , and

and the region of highest posterior density (HPD) (e.g.,��jrepresents all the elements of � except for �j . We

Gelman et al. 2004). To take the prior specifications,can sample �j directly from (17) or update �j with proba-bility min(1, r), where r � ((w/1 w)R)12�j. p(�h), into consideration, we can use the Bayes factor

to show evidence for inclusion of �h against exclusionThe effect �j was integrated from (17). We can generate�j as follows. If �j is sampled to be zero, �j � 0. Otherwise, of �h (Kass and Raftery 1995),�j is generated from its conditional posterior

BF(�h) �p(�h |y)

1 p(�h |y)·

1 p(�h)

p(�h). (20)p(�j |�j � 1, ��j

, X , ��j, y) � N(�j , � 2

j ), (18)

where In a similar fashion, we can compute the Bayes factorcomparing a chromosomal region containing QTL to

�j � (� 2 �2�j

� �n

i�1

x 2i j)1�

n

i�1

xij(yi � xi •� � xij�j) that excluding QTL.We can estimate the main effects at any locus or chro-

mosomal intervals �,and

�2j � �2

�j� �2 �

n

i�1

x 2i j . �k(�) �

1N �

N

t�1�L

q�1

1(�(t )q � �, � (t )

q � 1)�(t )qk , k � 1, 2, . . . , K .

(21)

The heritabilities explained by the main effects can alsoPOSTERIOR ANALYSIS be estimated. In epistatic analysis, we need to estimate

two types of additional parameters, the posterior inclu-The MCMC algorithm described above starts from ini-sion probability and the size of epistatic effects, bothtial values and updates each group of unknowns in turn.involving pairs of loci. These two types of unknowns canInitial iterations are discarded as “burn-in.” To reducebe estimated with natural extensions of (19) and (21),serial correlation, we thin the subsequent samples byrespectively.keeping every k th simulation draw and discarding the

rest, where k is an integer. The MCMC sampler se-quence {(�(t ), �(t )

� , g(t )� , � (t )

� ); t � 1, . . . , N } is a randomEXAMPLE

draw from the joint posterior distribution p(�, �� , g � ,�� |y), and thus the embedded subsequence {(�(t ), �(t )

� ); We illustrate the application of our Bayesian modelselection approach by an analysis of a mouse cross pro-t � 1, . . . , N } is a random sample from its marginal

posterior distribution p(�, �� |y), which is used to infer duced from two highly divergent strains: M16i, consist-ing of large and moderately obese mice, and CAST/Ei,the genetic architecture of the complex trait. For ge-

nome-wide epistatic analysis, no single model may stand a wild strain of small mice with lean bodies (Leamy et al.2002). CAST/Ei males were mated to M16i females, andout, and we may average over all possible models to as-

sess genetic architecture. Bayesian model averaging pro- F1 males were backcrossed to M16i females, resulting in54 litters and 421 mice (213 males, 208 females) reach-vides more robust inferences about quantities of interest

than any single model since it incorporates model un- ing 12 weeks of age. All mice were genotyped for 92microsatellite markers located on 19 autosomal chromo-certainty (Raftery et al. 1997; Ball 2001; Sillanpaa

and Corander 2002). somes. The marker linkage map covered 1214 cM withaverage spacing of 13 cM. In this study, we analyze FAT,The most important characteristic may be the poste-

rior inclusion probability of each possible locus �h , esti- the sum of right gonadal and hindlimb subcutaneousfat pads. Prior to QTL analysis, the phenotypic data weremated as

Page 6: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1338 N. Yi et al.

Figure 1.—Profiles of LOD scores from maxi-mum-likelihood interval mapping. On the x -axis,large tick marks represent chromosomes andsmall tick marks represent markers.

linearly adjusted by sex and dam and standardized to ated. For all analyses, the MCMC started with no QTLand ran for 4 105 cycles after discarding the first 2000mean 0 and variance 1, although this transformation

may result in the possibility of destroying true biological burn-ins. The chain was thinned by one in k � 20,yielding 2 104 samples for posterior Bayesian analysis.interaction (Jansen 2003). We used the Cockerham

genetic model (appendix a), in which the coefficients An initial interval map scan revealed three significantQTL (LOD � 3.2) on chromosomes 2, 13, and 15 (Fig-of main effects are defined as 0.5 and 0.5 for the two

genotypes, CM and MM, where C and M represent the ure 1), explaining 20.7, 4.9, and 5.1% of the phenotypicvariance, respectively.CAST/Ei and M16i alleles, respectively.

We partitioned each chromosome with a 1-cM grid, Under the nonepistatic analysis, epistatic effects arealways excluded from the model and thus putative QTLresulting in 1214 possible loci across the genome. A

nonepistatic and an epistatic QTL model were evalu- are chosen only on the basis of their main effects. As

Figure 2.—Bayesian nonepistatic anal-ysis: profiles of posterior inclusion proba-bility and cumulative probability function.Black line, l m � 1; red line, l m � 3; blueline, l m � 6. On the x - axis, large tick marksrepresent chromosomes and small tickmarks represent markers.

Page 7: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1339Bayesian Analysis of Genome-Wide Epistasis

Figure 3.—Bayesian nonepistatic analysis:profiles of Bayes factor. Black line, l m � 1; redline, l m � 3; blue line, l m � 6. On the x - axis,large tick marks represent chromosomes andsmall tick marks represent markers.

described earlier, we took the number of significant Therefore, the prior probabilities of inclusion for eachmain effect were w m � 1 [1 (l m/L)]1/K � 1⁄3 , 1⁄4 , andQTL detected in the interval mapping as the prior num-

ber of main-effect QTL (l m). To check prior sensitivity, 3⁄7 , respectively. Figure 2, top, displays the posterior prob-ability of inclusion for each locus across the genome.we reran the algorithm for l m � 1, 6. The upper bound

of the number of QTL was calculated as L � l m � 3√l m, Note the similarity to Figure 1, with clear evidence ofQTL and flat profiles on other chromosomes. The peaksor L � 9, 4, and 14 for l m � 3, 1, and 6, respectively.

Figure 4.—Bayesian epistatic analysis:profiles of posterior inclusion probabil-ity and cumulative probability function.Black line, l 0 � 4; red line, l 0 � 6; blueline, l 0 � 8. On the x -axis, large tick marksrepresent chromosomes and small tickmarks represent markers.

Page 8: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1340 N. Yi et al.

Figure 5.—Bayesian epistatic analysis: pro-files of Bayes factor. Black line, l 0 � 4; red line,l 0 � 6; blue line, l 0 � 8. On the x - axis, largetick marks represent chromosomes and smalltick marks represent markers.

on chromosomes 2, 13, and 15 overlap those identified tended to provide smaller posteriors, especially for infre-quently arising loci. However, the identification of fre-by interval mapping. The graphs of the cumulative dis-

tribution function, displayed in Figure 2, bottom, show quent arising loci remained the same. The profiles ofthe Bayes factor are depicted in Figure 5. The threethat the posterior inclusion probability of each chromo-

some is close to 1 for chromosomes 2, 13, and 15. The choices of l m provided similar profiles of the Bayes fac-tor, especially for infrequently arising loci.results show that, at least in this data set, detection of

large-effect QTL is not sensitive to the choice of l m. As shown in Figures 4 and 5, the epistatic analysesdetected the same regions on chromosomes 2, 13, andHowever, larger l m tend to pick up more small-effect

QTL as expected. The profiles of the Bayes factor are 15 as the nonepistatic analyses. In addition to those onchromosomes 2, 13, and 15, our epistatic analyses founddepicted in Figure 3. For the three choices of l m, the

regions on chromosomes 2, 13, and 15 show strong strong evidence of QTL on chromosomes 1, 18, and19 with high cumulative probabilities (close to 1) andevidence for being selected, and other regions show a

very low Bayes factor. suggestive evidence of QTL on chromosomes 7 and 14.In the nonepistatic analyses, these chromosomes wereThe epistatic analysis took l m � 3, the number of

QTL detected in the nonepistatic analyses, as the prior found to have weak main effects and hence were de-tected in the epistatic model mainly due to epistaticexpected number of main-effect QTL. Three values,

l 0 � 4, 6, and 8, were chosen as the prior expected interactions.The profiles of the location-wise main effects and thenumber of all QTL under the epistatic model. The up-

per bound of the number of QTL, L , was thus L � 10, variances explained by the main effects are depicted inFigure 6. For the three prior specifications, the posterior14, and 17, respectively. From Equations 7 and 8, the

prior inclusion probabilities were 0.30, 0.21, and 0.18 inferences were essentially identical. Therefore, we re-ported only the summary statistics for l 0 � 6 (see Tablesfor main effects and 0.017, 0.025, and 0.027 for epistatic

effects, for the three values of (l 0 , L), respectively. The 1 and 2). For the HPD regions on chromosomes 2, 13,and 15, the posterior inclusion probabilities are closeprofiles of the posterior inclusion probability for each

locus across the genome and the cumulative posterior to 1, and the corresponding Bayes factors are high. Theestimated main effects were 0.856, 0.371, and 0.342probability for each chromosome are depicted in Figure

4, top and bottom, respectively. It can be seen that the and explained 18.4, 3.5, and 3.1% of the phenotypicvariance, respectively. For the HPD regions on chromo-three different prior specifications of (l 0 , L) provided

fairly similar profiles of the posteriors, indicating that somes 1, 18, and 19, the posterior inclusion probabilitieswere �82, 88, and 70%, and the corresponding Bayesthe posterior inference may be not very sensitive toward

the small or mediate change of l 0 . As expected, the factors were �28, 47, and 12, respectively. In these HPDregions, the average main effects were weak and ex-choice of a smaller prior expected number of QTL

Page 9: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1341Bayesian Analysis of Genome-Wide Epistasis

Figure 6.—Bayesian epistatic analysis:profiles of main effect and heritabilityexplained by main effect. Black line, l 0 �4; red line, l 0 � 6; blue line, l 0 � 8.On the x - axis, large tick marks representchromosomes and small tick marks rep-resent markers.

plained low proportions of the phenotypic variance. from relatively short runs. The Bayesian framework pro-vides a robust inference of genetic architecture thatHowever, our epistatic analyses detected strong epistatic

interactions associated with the HPD regions on chro- incorporates model uncertainty by averaging over allpossible models (Raftery et al. 1997; Ball 2001; Sil-mosomes 1, 18, and 19. As shown in Table 2, the strong-

est epistasis is the interaction between chromosomes 1 lanpaa and Corander 2002).and 18. This epistatic effect was estimated to be 0.936 One of the most challenging statistical problems pre-and explained 5.6% of the phenotypic variance. The pos- sented by QTL mapping is that the number of QTL is un-terior inclusion probability of this epistasis was 81.9%. known. Most previous Bayesian mapping methods treatThe region of chromosome 19 was found to interact QTL models as models of varying dimension and em-with chromosomes 15 and 7. The interaction between ploy the reversible jump MCMC algorithm to explorethe regions of chromosomes 19 and 15 was 0.604 and the posterior. Although such a framework is very generalexplained 2.5% of the phenotypic variance. The epi- and powerful (Green 1995), it is difficult to implementstatic analyses also revealed interactions among chromo- efficient search strategies. The key idea of the proposedsomes 2, 13, and 15. For example, the interaction be- Bayesian approach is to turn varying dimensional spacetween the HPD regions on chromosomes 2 and 13 was of multiple-QTL models into fixed dimensional modelincluded in the model with probability of �60% and space by using a fixed but large set of known loci, �,explained �2.5% of the phenotypic variance. and putting a constraint on the upper bound of the

number of detectable QTL. In this setting, posteriorsimulation then can be achieved with a relatively simple

DISCUSSION Gibbs sampler or M-H algorithm (Godsill 2001; Yi2004). The algorithm proposed herein is easier to imple-The Bayesian model selection approach provides ament than the reversible jump method and it reducescomprehensive solution to mapping multiple epistaticthe computational time of model search, an essentialQTL across the entire genome using the posterior distri-feature for the practical analysis of complex geneticbution as a selection criterion. MCMC algorithms basedarchitectures.on the composite model space representation mix rap-

A prerequisite of the proposed method is a reasonableidly, thus ensuring that high-probability models are vis-ited frequently and quickly, resulting in good inference choice of the upper bound of the number of detectable

Page 10: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1342 N. Yi et al.

TABLE 1

Summary statistics for epistatic analysis: high posterior density (HPD) regions of QTL locations,posterior inclusion probabilities of main effects, Bayes factors, estimated main effects,

and heritabilities explained by main effects in the HPD regions

Chromosome

2 13 15 1 18 19 7 14

HPD region (cM) [72, 85] [20, 42] [1, 29] [26, 54] [43, 71] [15, 45] [50, 75] [12, 41]Posterior probability (%) 98.3 97.2 93.5 81.9 88.4 70.6 36.7 30.1Bayes factor 821.4 291.3 92.2 28.1 47.3 12.2 4.1 2.7Main effect 0.856 0.371 0.342 0.037 0.103 0.167 0.137 0.147Heritability 0.184 0.035 0.031 0.002 0.015 0.020 0.019 0.009

QTL. A minimal requirement is that the predetermined to reasonably reduce the model space, such as our pro-posed composite model space approach, can improveupper bound is greater than the true number of QTL

with high probability. As an extreme case, we could take the performance of the MCMC algorithms and enhanceour ability to detect complex epistatic QTL. We parti-the total number of loci (H ) as the upper bound. Since

the number of detectable QTL is usually much less than tion the entire genome into intervals by a number ofpoints and restrict putative QTL to these fixed points,H , such a choice is unlikely to be optimal. The sugges-

tion made here utilizes the expected number of QTL reducing loci to a discrete space. Additional speedup isachieved by computing the conditional probability ofand the prior probability distribution of the number

of QTL to determine the upper bound. The expected the genotypes given the marker data on this fixed (butdense) grid of possible locations before the MCMC pro-number of QTL could be roughly estimated using stan-

dard genome scans. In practice, one could experiment cedure starts.Several other strategies of reducing the model spacewith several values of the expected number of QTL and

investigate their impact on the posterior inference. In could be incorporated into the proposed approach toimprove the procedure. We could adopt a two-stagehigh-dimensional problems, specifying the prior distri-

butions on both the model space and parameters is search method, first searching for main-effect QTL andsecond searching for epistatic effects of these and addi-perhaps the most difficult aspect of Bayesian model

selection. We propose a novel method for elicitation of tional epistatic QTL given the already detected main-effect QTL. The positions and main effects of the QTLprior distribution on the indicator variables. Instead of

directly specifying the prior inclusion probabilities wm detected in the first stage should be updated in thesecond stage since inclusion of epistatic effects may yieldand w e , the expected numbers of main-effect QTL and

all QTL can first be given incorporating previous results more accurate estimation of the positions and the ef-fects. Alternatively, we could selectively ignore some ge-and then are used to determine wm and w e . Here we

have fixed wm and w e but we could relax this by treating netic effects. Even with a moderate number of detect-able QTL, the epistatic models must accommodate manywm and w e as unknown model parameters and assigning

priors (Kohn et al. 2001). potential genetic effects. In a backcross population, forexample, there are a total of L(L � 1)/2 (� 210, if L �A major difficulty of genome-wide epistatic analysis

is created by the huge size of the model space. Strategies 20, say) possible effects, but many may be negligible.

TABLE 2

Summary statistics for epistatic analysis: posterior inclusion probabilities of epistatic effects,estimated epistatic effects, and heritability explained by each epistatic effect

Posterior Epistaticprobability (%) effect Heritability

Chr 1 [26, 54] Chr 18 [43, 71] 81.9 0.936 0.056Chr 2 [72, 85] Chr 13 [20, 42] 59.5 0.575 0.025Chr 15 [1, 29] Chr 19 [15, 45] 43.2 0.606 0.024Chr 2 [72, 85] Chr 14 [12, 41] 18.4 0.567 0.022Chr 7 [50, 75] Chr 19 [15, 45] 17.2 0.552 0.021Chr 13 [20, 42] Chr 15 [1, 29] 13.6 0.501 0.018

Chr, chromosome.

Page 11: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1343Bayesian Analysis of Genome-Wide Epistasis

M. Wade, B. Brodie and J. Wolf. Oxford University Press, NewTo see this, categorize putative QTL into three types:York.

(1) QTL with main effects (main-effect QTL), (2) QTL Chipman, H., E. I. Edwards and R. E. McCulloch, 2001 The practi-cal implementation of Bayesian model selection, pp. 65–116 inwith weak main effects but epistatic effects with otherModel Selection, edited by P. Lahiri. Institute of Mathematicalmain-effect QTL, and (3) QTL with weak main effectsStatistics, Beachwood, OH.

but epistatic effects among themselves. Letting the num- Fernandez, C., E. Ley and M. F. J. Steel, 2001 Benchmark priorsfor Bayesian model averaging. J. Econom. 100: 381–427.bers of these three types of QTL be L 1 , L 2 , and L 3 (L �

Gaffney, P. J., 2001 An efficient reversible jump Markov chainL 1 � L 2 � L 3), respectively, and ignoring the mainMonte Carlo approach to detect multiple loci and their effects

effects of (2) and (3) QTL, the number of possible ef- in inbred crosses. Ph.D. Dissertation, Department of Statistics,University of Wisconsin, Madison, WI.fects reduces to L 1(L 1 � 1)/2 � L 1L 2 � L 3(L 3 1)/2

Gelman, A., J. Carlin, H. Stern and D. Rubin, 2004 Bayesian Data(� 115, if L 1 � 10, L 2 � 5, and L 3 � 5). These threeAnalysis. Chapman & Hall, London.

types of QTL can be detected either simultaneously or George, E. I., 2000 The variable selection problem. J. Am. Stat.Assoc. 95: 1304–1308.conditionally with a three-stage approach.

George, E. I., and D. P. Foster, 2000 Calibration and empiricalA number of extensions of the basic model are possi-Bayes variable selection. Biometrika 87: 731–747.

ble within this framework. The simplicity of the MCMC George, E. I., and R. E. McCulloch, 1997 Approaches for Bayesianvariable selection. Stat. Sin. 7: 339–373.search enhances the overall flexibility of this approach

Godsill, S. J., 2001 On the relationship between MCMC modeland enables one to consider analysis in more complexuncertainty methods. J. Comput. Graph. Stat. 10: 230–248.

settings. Extensions to binary or ordinal traits, inclusion Green, P. J., 1995 Reversible jump Markov chain Monte Carlo com-putation and Bayesian model determination. Biometrika 82:of fixed- or random-effect covariates, and gene-by-envi-711–732.ronment interactions are feasible. In principle, the com-

Heath, S. C., 1997 Markov chain Monte Carlo segregation andposite space method can be directly applied to identify linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:

748–760.higher-order interactions. However, the dramatic in-Hoeschele, I., 2001 Mapping quantitative trait loci in outbred pedi-crease in the size of model space is likely to limit the grees, pp. 599–644 in Handbook of Statistical Genetics, edited by

performance of the MCMC algorithm. We regard the D. J. Balding, M. Bishop and C. Cannings. Wiley, New York.Jansen, R. C., 2003 Studying complex biological systems using multi-methods proposed here as a step toward achieving more

factorial perturbation. Nat. Rev. Genet. 4: 145–151.efficient and comprehensive analysis of complex genetic Jansen, R. C., and P. Stam, 1994 High resolution of quantitative traitsarchitectures. There are many opportunities to extend into multiple loci via interval mapping. Genetics 136: 1447–1455.

Jiang, C., and Z-B. Zeng, 1997 Mapping quantitative trait loci withand improve upon this general approach.dominant and missing markers in various crosses from two inbred

N.Y. and D.B.A. were supported by National Institutes of Health (NIH) lines. Genetica 101: 47–58.Kadane, J. B., and N. A. Lazar, 2004 Methods and criteria for modelgrants NIH RO1GM069430, NIH RO1ES09912, NIH RO1 DK056366,

selection. J. Am. Stat. Assoc. 99: 279–290.NIH P30DK056336, and an obesity-related pilot/feasibility studiesKao, C. H., and Z-B. Zeng, 2002 Modeling epistasis of quantitativegrant at the University of Alabama (Birmingham) (528176). G.A.C.

trait loci using Cockerham’s model. Genetics 160: 1243–1261.was supported by NIH GM070683. B.S.Y. was supported by NIH/Kao, C. H., Z-B. Zeng and R. D. Teasdale, 1999 Multiple intervalNational Institute of Diabetes and Digestive and Kidney Diseases mapping for quantitative trait loci. Genetics 152: 1203–1216.

(NIDDK) 5803701, NIH/NIDDK 66369-01, American Diabetes Associ- Kass, R. E., and A. E. Raftery, 1995 Bayes factors. J. Am. Stat. Assoc.ation 7-03-IG-01, and U.S. Department of Agriculture Cooperative 90: 773–795.State Research, Education and Extension Service grants to the Univer- Kohn, R., M. Smith and D. Chen, 2001 Nonparametric regression

using linear combinations of basis functions. Stat. Comput. 11:sity of Wisconsin (Madison) (C.J. and B.S.Y.). This research is a con-313–322.tribution of the University of Nebraska Agricultural Research Division

Kuo, L., and B. Mallick, 1998 Variable selection for regression(Lincoln, NE; journal series no. 14858) and the North Carolina Ag-models. Sankhya Ser. B 60: 65–81.ricultural Research Service and was supported in part by funds pro-

Lander, E. S., and D. Botstein, 1989 Mapping Mendelian factorsvided through the Hatch Act. underlying quantitative traits using RFLP linkage maps. Genetics121: 185–199.

Leamy, L. J., D. Pomp, E. J. Eisen and J. M. Cheverud, 2002 Pleiot-ropy of quantitative trait loci for organ weights and limb bonelengths in mice. Physiol. Genomics 10: 21–29.LITERATURE CITED

Lynch, M., and B. Walsh, 1998 Genetics and Analysis of QuantitativeBall, R. D., 2001 Bayesian methods for quantitative trait loci map- Traits. Sinauer Associates, Sunderland, MA.

ping based on model selection: approximate analysis using the Narita, A., and Y. Sasaki, 2004 Detection of multiple QTL withBayesian information criterion. Genetics 159: 1351–1364. epistatic effects under a mixed inheritance model in an outbred

Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004 Modifying the population. Genet. Sel. Evol. 36: 415–433.Schwarz Bayesian information criterion to locate multiple inter- Raftery, A. E., D. Madigan and J. A. Hoeting, 1997 Bayesianacting quantitative trait loci. Genetics 167: 989–999. model averaging for linear regression models. J. Am. Stat. Assoc.

Broman, K. W., and T. P. Speed, 2002 A model selection approach 92: 179–191.for identification of quantitative trait loci in experimental crosses. Reifsnyder, P. R., G. Churchill and E. H. Leiter, 2000 MaternalJ. R. Stat. Soc. B 64: 641–656. environment and genotype interact to establish diabesity in mice.

Carlborg, O., and C. Haley, 2004 Epistasis: Too often neglected Genome Res. 10: 1568–1578.in complex trait studies? Nat. Rev. Genet. 5: 618–625. Satagopan, J. M., and B. S. Yandell, 1996 Estimating the number of

Carlborg, O., L. Andersson and B. Kinghorn, 2000 The use of quantitative trait loci via Bayesian model determination. Speciala genetic algorithm for simultaneous mapping of multiple inter- Contributed Paper Session on Genetic Analysis of Quantitativeacting quantitative trait loci. Genetics 155: 2003–2010. Traits and Complex Disease. Biometric Section, Joint Statistical

Carlin, B. P., and S. Chib, 1995 Bayesian model choice via Markov Meeting, Chicago.chain Monte Carlo. J. Am. Stat. Assoc. 88: 881–889. Satagopan, J. M., B. S. Yandell, M. A. Newton and T. C. Osborn,

Cheverud, J. M., 2000 Detecting epistasis among quantitative trait 1996 A Bayesian approach to detect quantitative trait loci usingMarkov chain Monte Carlo. Genetics 144: 805–816.loci, pp. 58–81 in Epistasis and the Evolutionary Process, edited by

Page 12: Bayesian Model Selection for Genome-Wide …pages.cs.wisc.edu/~yandell/doc/2005/67.Genetics.pdfCopyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.040386

1344 N. Yi et al.

Sen, S., and G. Churchill, 2001 A statistical framework for quantita- xiq1 � ziq 1,tive trait mapping. Genetics 159: 371–387.

Sillanpaa, M. J., and E. Arjas, 1998 Bayesian mapping of multiple xiq2 � (1 � xiq1)(1 xiq1) 0.5,quantitative trait loci from incomplete inbred line cross data.Genetics 148: 1373–1388.

Sillanpaa, M. J., and J. Corander, 2002 Model choice in genemapping: what and why. Trends Genet. 18: 301–307.

xiqq�k �

⎧⎪⎨⎪⎩

xiq1xiq�1, k � 1xiq1xiq�2 , k � 2xiq2xiq�1 , k � 3xiq2xiq�2 , k � 4.

Stephens, D. A., and R. D. Fisch, 1998 Bayesian analysis of quantita-tive trait locus data using reversible jump Markov chain MonteCarlo. Biometrics 54: 1334–1347.

Thomas, D. C., S. Richardson, J. Gauderman and J. Pitkaniemi,1997 A Bayesian approach to multipoint mapping in nuclear For the Cockerham model, �q1 and �q 2 correspond tofamilies. Genet. Epidemiol. 14: 903–908. additive and dominance effects of QTL q , respectively;Uimari, P., and I. Hoeschele, 1997 Mapping linked quantitative

and �qq�1 , �qq�2 , �qq�3 , and �qq�4 are the epistatic effects be-trait loci using Bayesian method analysis and Markov chain MonteCarlo algorithms. Genetics 146: 735–743. tween loci q and q�, called additive-by-additive, additive-

Yi, N., 2004 A unified Markov chain Monte Carlo framework for by-dominance, dominance-by-additive, and dominance-mapping multiple quantitative trait loci. Genetics 167: 967–975.by-dominance effects, respectively. The Cockerham modelYi, N., and S. Xu, 2000 Bayesian mapping of quantitative trait loci

for complex binary traits. Genetics 155: 1391–1403. keeps the same interpretation of main effects with orYi, N., and S. Xu, 2002 Mapping quantitative trait loci with epistatic without epistatic effects. However, main effects shouldeffects. Genet. Res. 79: 185–198.

always be interpreted with caution in the presence ofYi, N., D. B. Allison and S. Xu, 2003 Bayesian model choice andsearch strategies for mapping multiple epistatic quantitative trait epistatic interactions.loci. Genetics 165: 867–883.

Yi, N., A. Diament, S. Chiu, J. Fisler and C. Warden, 2004a Charact-erization of epistasis influencing complex spontaneous obesityin the BSB model. Genetics 167: 399–409. APPENDIX B: THE PRIOR EXPECTED NUMBER OF

Yi, N., A. Diament, S. Chiu, J. Fisler and C. Warden, 2004b Epistatic QTL INCLUDED IN THE MODELinteraction between chromosomes 7 and 3 influences hepaticlipase activity in BSB mice. J. Lipid Res. 45: 2063–2070. We define �q as the binary variable to indicate inclu-

Zeng, Z-B., 1994 Precision mapping of quantitative trait loci. Genet- sion (�q � 1) or exclusion (�q � 0) of QTL q . QTL q isics 136: 1457–1468.included into the model when and only when at leastZeng, Z-B., C. Kao and C. J. Basten, 2000 Estimating the genetic

architecture of quantitative traits. Genet. Res. 74: 279–289. one of the genetic effects associated with QTL q is in-cluded. Therefore, we haveCommunicating editor: J. B. Walsh

�q � 1 �K

k�1

(1 �qk)�K2

k�1� �

L

q��q

(1 �qq�k) �L

q��q

(1 �q�qk)�,APPENDIX A: THE MODIFIED COCKERHAMEPISTATIC MODEL FOR BACKCROSS AND

where K is the number of possible main effects for eachINTERCROSS POPULATIONSlocus, K 2 is the number of possible epistatic effects for

For a mapping population with K � 1 genotypes perany two loci, and �qk and �qq�k are the indicators of main

locus, there are K marginal effect degrees of freedomand epistatic effects, respectively. The actual number of

(d.f.) for each locus and K 2 interaction-effect d.f. forQTL then equals �L

q�1�q . The prior expected numberany two loci. The design matrix X for model (1) has KLof all QTL is the expectation of the actual number ofmain-effect coefficients, xiqk , and K 2L(L 1)/2 epistaticQTL and thus can be derived aseffect coefficients, xiqq �k , obtained from the genotypes at

the corresponding loci by using a particular epistatic l 0 � �L

q�1

pr(�q � 1)model. The main and epistatic effects are denoted by�qk and �qq �k , respectively.

� L �L

q�1��

K

k�1

pr(�qk � 0) �K2

k�1� �

L

q��q

pr(�qq�k � 0) �L

q��q

pr(�q�qk � 0)�For a backcross population, there are two segregatinggenotypes denoted by bqbq , Bqbq at locus q . For the com-

� L[1 (1 w m)K(1 w e)K2(L1)] .monly used Cockerham epistatic model (Kao and Zeng2002), the coefficients are defined as If we consider only main effects, then QTL q is included

into the model when at least one of the main effects ofxiq1 � ziq 0.5 and xiqq �1 � xiq1xiq �1 ,QTL q is included. The binary indicator variable of QTL

where ziq denotes the number of alleles Bq . For an in- q then becomes �q � 1 �Kk�1(1 �qk). Therefore, the

tercross derived from two inbred lines, there are three prior expected number of main-effect QTL issegregating genotypes denoted by bqbq , Bqbq , and BqBq atlocus q . For the commonly used Cockerham epistatic l m � L �

L

q�1��

K

k�1

pr(�qk � 0)� � L[1 (1 wm)K].model, the coefficients are defined as