6
Genetic Epidemiology 14:993!998 (1997) Joint Segregation and Linkage Analysis of a Quantitative Trait Compared to Separate Analyses W. James Gauderman, Cheryl L. Faucett, John L. Morrison, Catherine L. Carpenter Department of Preventive Medicine (W.J.G., J.L.M., C.L.C.), University of Southern California, Los Angeles, California; and Department of Biostatistics (C.L.F.), University of California, Los Angeles, Los Angeles, California Our goal was to determine the degree to which joint segregation and linkage analysis leads to increased efficiency for estimating the recombination fraction and to greater power for detecting linkage, compared to separate analyses. We concentrated on the quantitative phenotype Q2 and analyzed linkage with a tightly linked marker, a loosely linked marker, and eight unlinked markers, the latter chosen to evaluate false positive rates. We considered both nuclear-family and extended-pedigree data, using the 200 replicates of each provided to GAW participants. We found joint analysis to be consistently more efficient, with relative efficiencies for the tightly linked marker of 1.16 and 1.06 in extended pedigrees and nuclear families, respectively. These relative efficiencies translated into modest but consistent gains in power to detect linkage. Both methods appear to produce unbiased parameter estimates and similar false positive rates. © 1997 Wiley-Liss, Inc. Key words: efficiency, lod scores, power, recombination fraction INTRODUCTION Segregation and linkage analyses are used to evaluate the role of genes in the expression of quantitative traits. Segregation analysis is used to determine mode of inheritance and to estimate trait model parameters (e.g., allele frequency, genotype- specific means). Linkage analysis is used to estimate the recombination fraction between the trait locus and one or more marker loci. Traditionally, the trait model parameters in Address reprint requests to Dr. W. James Gauderman, Department of Preventive Medicine, University of Southern California, 1540 Alcazar St., Suite 220, Los Angeles, CA 90033. © 1997 Wiley-Liss, Inc.

Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

Embed Size (px)

Citation preview

Page 1: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

Genetic Epidemiology 14:993!998 (1997)

Joint Segregation and Linkage Analysisof a Quantitative Trait Compared toSeparate AnalysesW. James Gauderman, Cheryl L. Faucett, John L. Morrison, Catherine L.Carpenter

Department of Preventive Medicine (W.J.G., J.L.M., C.L.C.), University ofSouthern California, Los Angeles, California; and Department of Biostatistics(C.L.F.), University of California, Los Angeles, Los Angeles, California

Our goal was to determine the degree to which joint segregation and linkageanalysis leads to increased efficiency for estimating the recombination fractionand to greater power for detecting linkage, compared to separate analyses. Weconcentrated on the quantitative phenotype Q2 and analyzed linkage with a tightlylinked marker, a loosely linked marker, and eight unlinked markers, the latterchosen to evaluate false positive rates. We considered both nuclear-family andextended-pedigree data, using the 200 replicates of each provided to GAWparticipants. We found joint analysis to be consistently more efficient, withrelative efficiencies for the tightly linked marker of 1.16 and 1.06 in extendedpedigrees and nuclear families, respectively. These relative efficiencies translatedinto modest but consistent gains in power to detect linkage. Both methods appearto produce unbiased parameter estimates and similar false positive rates.© 1997 Wiley-Liss, Inc.

Key words: efficiency, lod scores, power, recombination fraction

INTRODUCTION

Segregation and linkage analyses are used to evaluate the role of genes in theexpression of quantitative traits. Segregation analysis is used to determine mode ofinheritance and to estimate trait model parameters (e.g., allele frequency, genotype-specific means). Linkage analysis is used to estimate the recombination fraction between the trait locus and one or more marker loci. Traditionally, the trait model parameters in

Address reprint requests to Dr. W. James Gauderman, Department of Preventive Medicine, University ofSouthern California, 1540 Alcazar St., Suite 220, Los Angeles, CA 90033.

© 1997 Wiley-Liss, Inc.

Page 2: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

Q2i ' " % $1 EFi % $2 Gi % gi

g

994 Gauderman et al.

a linkage analysis are fixed to specific values, obtained either from a previous study onanother data set or from a prior segregation analysis on the current data set. Clerget-Darpoux et al. [1986] showed that misspecification of the trait model parameters in alinkage analysis leads to biased estimates of the recombination fraction and alters the lodscore distribution, in some cases leading to reduced power for detecting linkage.Misspecification is likely if parameter estimates are derived from previously publishedestimates using a different data set. Nonparametric linkage methods have been developedto avoid the problem of a misspecified trait model [Haseman and Elston, 1972, Weeks andLange, 1988], but these can be substantially less powerful than the lod score approach fordetecting linkage [Kruglyak et al., 1996] and they do not provide an estimate of therecombination fraction.

Estimating both the trait and linkage model parameters from the same data set shouldbe an improvement over using previously published trait model estimates. However, theanalyst still has the option of performing segregation analysis first followed by linkageanalysis (separate analyses), or joint segregation and linkage analysis. Blangero (1995),in his review of analyses of a single simulated data set from GAW9, indicated that jointanalysis was a powerful strategy for localizing a gene, even when the underlyingpenetrance model was misspecified. Asymptotically, joint analysis will produce moreefficient estimates than separate analyses. In the context of simulated data on both nuclearand extended pedigrees, we investigate the degree to which estimates of therecombination fraction and power to detect linkage are improved by using joint analysisrelative to separate analyses.

METHODS

We addressed the question using both the nuclear-family and extended-pedigreesimulated data provided to GAW 10 participants. The nuclear-family data included 200replicates of 239 nuclear families with 1,197 individuals per replicate. The extended-pedigree data included 200 replicates of 23 families with 1,497 individuals per replicate.For both the nuclear families and extended pedigrees, complete data were provided onliving individuals (1,000 per replicate), with marker data provided on both living and deadsubjects. The markers on the dead subjects were excluded in analyses of the nuclearfamilies, but due to computation time, were included in the analyses of the extendedpedigrees. However, for comparative purposes, we did perform one set of analyses onextended pedigrees excluding markers on dead subjects.

We analyzed the quantitative trait Q2, which was simulated with effects due to ameasured environmental factor (EF), a diallelic major gene (MG), and a polygeniccomponent. Based on our knowledge of the simulation model for Q2, we assumed a singleunderlying major locus (MG), with alleles “A” and “a” in Hardy-Weinberg equilibrium andallele frequency q . Letting the subscript i be a subject index, we used a linear model forA

Q2 of the following form:

where is the random error, assumed to be normally distributed with mean zero andvariance F , and G is a covariate determined by the genotype at MG and the mode of2

Page 3: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

Joint Segregation and Linkage Analysis 995

inheritance. We assumed additive inheritance in all analyses, as per the simulation model,so that G = 1.0, 0.5, or 0.0 for MG = AA, Aa, or aa, respectively. Although obtainingmaximum likelihood estimates in a mixed model is possible, it would have been toocomputationally time consuming for 200 data replicates. We therefore did not include apolygenic effect, and while this may lead to biased estimates of some model parameters(e.g., the major gene penetrance), we expect comparisons between separate and jointanalysis methods to remain valid.

We concentrated on two linked markers on chromosome 8, each with five alleles anda heterozygosity of 0.74: D8G26 (0.3 centimorgans from MG) and D8G17 (19.3centimorgans from MG). These were chosen to reflect the situations of tight and looselinkage, respectively. We assumed linkage equilibrium between MG and each of thesemarker loci, and used the Haldane mapping function to obtain true recombination fractions2 = 0.003 between MG and D8G26, and 2 = 0.160 between MG and D8G17. For eachdata replicate we performed two types of analysis:

1. Separate: Segregation analysis was used first to estimate the Q2 model parameters(", $ , $ , F , and q ). Linkage analysis (one for each marker) was then used to1 2 A

2

estimate the recombination fraction, fixing the Q2 model parameters at theirmaximum likelihood estimates from the segregation analysis.

2. Joint: A joint segregation and linkage analysis (one for each marker) was used tosimultaneously estimate both the Q2 model parameters and the recombinationfraction.

In both cases, the marker allele frequencies were fixed to the values provided with thedata. All analyses were performed using the Genetic Analysis Package [GAP, 1997].

To compare the two analytic strategies, we computed the mean, median, and meansquared error of the recombination fraction maximum likelihood estimates for each markeracross the 200 data replications. We also computed the mean maximum lod score acrossreplicates and the percentage of replicates resulting in a maximum lod score greater than0.83 (p = 0.05 for a likelihood ratio test of the null hypothesis of no linkage) and greaterthan 3.0 (p = 0.0001). To evaluate false positive rates for each method, we analyzedlinkage to eight unlinked markers, each with five alleles and heterozygosity approximately0.74. We computed the proportion of 1,600 analyses (200 replicates, each with eightmarkers) yielding a lod score above 0.83, i.e., a type I error at the nominal 0.05 level.

RESULTS

Table I shows the summary of results using the nuclear-family data. For the tightlylinked marker D8G26 (true 2 = 0.003), the distribution of 2 across replicates was skewed ^

in both the separate and joint analyses. The median estimate obtained from the jointanalysis was 0.113, while it was 0.129 from the separate analyses. The mean squarederror was lower from the joint analysis (0.034) compared to the separate analyses (0.036),with estimated relative efficiency (MSE /MSE ) equal to 1.06. The estimated power fors j

detecting linkage at the 5% significance level was 37.5% based on the joint analysis,compared to 36.5% based on the separate analyses. A lod score of 3.0 was exceeded in2.0% of the joint analyses and in none of the separate analyses. For the loosely linked

Page 4: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

996 Gauderman et al.

TABLE I. Summary Statistics from Separate and Joint Segregation and Linkage Analyses Using200 Replicates of Nuclear-family Data

AnalysisMarker Summary (true 2) statistics Separate Joint

D8G26 Mean 2 0.145 0.137 ^

(2 = 0.003) Median 2 0.129 0.113 ^

MSE(2) 0.036 0.034 (ratio = 1.06) ^

Mean lod(2) 0.775 0.811 ^

Power : a

" = 0.05 36.5% 37.5% " = 0.0001 0.0% 2.0%

D8G17 Mean 2 0.248 0.243 ^

(2 = 0.160) Median 2 0.218 0.207 ^

MSE(2) 0.035 0.035 (ratio = 1.00) ^

Mean lod(2) 0.380 0.399 ^

Power: " = 0.05 17.5% 18.0% " = 0.0001 0.0% 0.0%

Unlinked Power (" = 0.05) 3.4% 3.6%b

Proportion of replicates with lod score above 0.83 (" = 0.05) or 3.0 (" = 0.0001). a

Summary from analyses of eight unlinked markers. b

marker D8G17 (true 2 = 0.16), the medians of the estimates across replicates were 0.207from the joint analysis and 0.218 from the separate analyses. The estimated mean squarederrors and powers from both analyses were virtually identical. Based on the analyses ofthe unlinked markers, the estimated false positive rates were 0.036 for joint analysis and0.034 for separate analyses.

Table II shows the summary of results using the extended-pedigree data sets. Fora tightly linked marker, the efficiency of joint analysis relative to separate analyses was1.16. The estimated power for detecting linkage was 78.5% at the 0.05 level and 19.5%at the 0.0001 level for the joint analysis, compared to 74.0% and 0.0% for the separateanalyses. For a loosely linked marker, the relative efficiency was 1.06, with small gainsin power for detecting linkage using the joint analysis. The estimated false positive ratesfrom analyses of the eight unlinked markers were 0.029 for joint analysis and 0.025 forseparate analyses.

We re-analyzed linkage to the tightly linked marker D8G26 in the 200 extended-pedigree replicates excluding the marker genotypes on the dead subjects. The results (datanot shown) were similar to those shown in Table II with respect to the comparison between separate and joint analyses. The estimated mean squared errors were 0.026 and0.023 for separate and joint analyses, respectively, with corresponding relative efficiency1.13. The estimated powers at the 0.05 level were 63.5% for separate analyses and 65.0%for joint analysis, while at the 0.0001 level they were 0.0% and 17.0%. Compared to theresults for D8G26 in nuclear families shown in Table I, these values show the increase inefficiency and power that comes from utilizing extended pedigrees rather than nuclearfamilies.

Page 5: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

Joint Segregation and Linkage Analysis 997

TABLE II: Summary Statistics from Separate and Joint Segregation and Linkage Analyses Using200 Replicates of Extended-pedigree Data

AnalysisMarker Summary (true 2) statistics Separate Joint

D8G26 Mean 2 0.126 0.112 ^

(2 = 0.003) Median 2 0.116 0.102 ^

MSE(2) 0.023 0.020 (ratio = 1.16) ^

Mean lod2) 1.803 1.928 ^

Power :a

" = 0.05 74.0% 78.5% " = 0.0001 0.0% 19.5%

D8G17 Mean 2 0.260 0.250 ^

(2 = 0.160) Median 2 0.240 0.232 ^

MSE(2) 0.027 0.025 (ratio =1.06) ^

Mean lod 2) 0.646 0.692 ^

Power: " = 0.05 28.0% 30.0% " = 0.0001 0.0% 2.5%

Unlinked Power (" = 0.05) 2.5% 2.9%b

Proportion of replicates with lod score above 0.83 (" = 0.05) or 3.0 (" = 0.0001). a

Summary from analyses of eight unlinked markers.b

DISCUSSION

Our findings verify that joint segregation and linkage analysis produces less biasedand more efficient estimates of the recombination fraction, and increases power fordetecting linkage, compared to separate segregation analysis followed by linkage analysis.The degree of improvement depends on how tightly the trait and marker loci are linked,and whether the data set includes nuclear families or extended pedigrees. In extendedpedigrees with complete marker data, joint analysis was 16% more efficient than separateanalyses for estimating the recombination fraction between the trait locus and a tightlylinked marker, and 6% more efficient for a loosely linked marker. In nuclear families withsome missing marker data, joint analysis was 6% more efficient for a tightly linkedmarker, and no more efficient for a loosely linked marker. These relative efficienciestranslated into modest, but consistent, gains in power to detect linkage using joint analysis. Both joint and separate analyses demonstrated false positive (type I error) rateslower than the expected nominal 0.05 level, and on average, overestimation of therecombination fraction. These findings are consistent with those of Clerget-Darpoux et al.[1986] for the case of a misspecified penetrance, in our case due to the exclusion of apolygenic effect in the penetrance model. In general, then, our estimates of power basedon these data are probably conservative. Even so, it is interesting to note the virtual lackof power for detecting a tightly linked marker when significance was based on a lod scoreexceeding 3.0, even for extended pedigrees with complete marker data. This indicates thatwhen investigating linkage to a single marker, a less restrictive lod score may be moreappropriate.

Page 6: Joint segregation and linkage analysis of a quantitative trait compared to separate analyses

998 Gauderman et al.

Although our results were based on analysis of a quantitative trait phenotype (Q2),joint segregation and linkage analysis should also be advantageous for a qualitative trait(e.g., breast cancer, multiple sclerosis, etc.). If pedigrees are sampled at random from thepopulation, as is often the case for a continuous phenotype, there is no need for anascertainment correction to the likelihood. However, for disease outcomes, pedigrees areoften chosen based on the status of one or more of their members (e.g., bilateral breastcancer prior to age 50), thus requiring an ascertainment correction to make the resultingtrait model parameter estimates generalizable to the population of interest. If primaryinterest lies in the recombination fraction, investigators may concentrate on pedigrees thatare heavily loaded with diseased individuals, making proper ascertainment correctiondifficult. In this case, estimates of the trait model parameters from previous analyses(probably from another data set) will be necessary. The potential biases that areintroduced using this strategy have to be considered in light of the increased informationfor linkage that comes with heavily loaded families.

Joint analysis is computationally more demanding than separate analyses, especiallyfor extended pedigrees with missing marker data. For example, in the analyses of thesesimulated nuclear and extended pedigree data sets, joint analysis required approximatelysix times more computation time than separate analyses. The current availability of fastcomputers should reduce the importance of computation time in deciding which type ofanalysis to perform, allowing the analyst to more strongly consider issues of statisticalefficiency when deciding on an analytic plan. However, if several markers will beanalyzed, it may be more feasible to use separate analyses on all markers, followed by jointanalysis in promising regions. In the event that computation of the likelihood is infeasibledue to model or pedigree complexity (e.g., analysis of inbred pedigrees), Monte Carlotechniques for joint segregation and linkage analysis can be utilized [Thomas andCortessis, 1992, Guo and Thompson, 1992].

ACKNOWLEDGMENTS

This work was supported by a grant from the National Cancer Institute of the USPublic Health Service (CA52862), and by a fellowship for Dr. Carpenter from the NorrisComprehensive Cancer Center.

REFERENCES

Blangero J (1995): Genetic analysis of a common oligogenic trait with quantitative correlates: summary ofGAW9 results. Genetic Epidemiol 12:689-706.

Clerget-Darpoux F, Bonaiti-Pellie C, Hochez J (1986): Effects of misspecifying genetic parameters in lodscore analysis. Biometrics 42:393-399.

GAP (1997): The Genetic Analysis Package. Epicenter Software, Inc., Pasadena CA, USA.Guo SW, Thompson EA (1992): A Monte-Carlo method for combined segregation and linkage analysis.

Am J Hum Genet 51:1111-1126.Haseman JK, Elston RC (1972): The investigation of linkage between a quantitative trait and a marker

locus. Behav Genet 2:3-19.Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996): Parametric and nonparametric linkage analysis:

a unified multipoint approach. Am J Hum Genet 58:1347-1363.Thomas DC, Cortessis VC (1992): A Gibbs sampling approach to linkage analysis. Hum Hered 42:63-76.Weeks DE, Lange K (1988): The affected-pedigree member method of linkage analysis. Am J Hum Genet

42:315-326.