Upload
independent
View
0
Download
0
Embed Size (px)
Citation preview
Ann. Hum. Genet. (2001), 65, 293–312
Printed in Great Britain
293
Sample size calculations for classical association and TDT-type methods
using family data
D. A. TREGOUET, C. PALLAUD, C. SASS, S. VISVIKIS L. TIRET
INSERM Unite 525, 91 bd de l’HoW pital, 75634 Paris Cedex, France
(Received 20.9.00. Accepted 2.2.01)
Transmission Disequilibrium Test (TDT)-based methods have been advocated by several authors
for testing that a marker-phenotype association is actually due to linkage and not to uncontrolled
stratification. As a pre-requisite of TDT-type methods is the presence of an association between
marker and phenotype, one may wish to first investigate the association using a classical association
study, and then to check by a TDT approach whether this association is actually due to linkage. We
propose an estimating equation (EE) procedure, to compute analytically the minimum sample size
of sibship data required to detect the association between a marker and a quantitative phenotype,
and that required to confirm it by two TDT methods. We show that, when the marker allele
frequency is low or high, the number of informative sibs needed in TDT-type methods can be lower
than the number required in an association analysis, and even more so when the familial clustering
is strong. However, in all cases, the number of sibs that need to be sampled to get the appropriate
number of informative sibs for analysis is always larger for TDT methods than for an association
study. In a phenotype-first strategy, this number may be critical when investigating costly
phenotypes.
The candidate gene approach is widely used
for identifying genes involved in complex human
diseases (Collins et al. 1997; Lander, 1996; Risch
& Merikangas, 1996). It relies either on the
characterization of functional polymorphisms
which affect some biological phenotype(s) pre-
disposing to disease, or on linkage disequilibrium
existing between observed markers and un-
identified functional polymorphisms.
Generally, the role of candidate genes is
investigated by means of classical association
studies, i.e. by testing by conventional statistical
methods the association, in a sample of unrelated
individuals, between the marker genotype and
the phenotype of interest, which can be the
disease itself or some intermediate quantitative
Correspondence: D. A. Tre! goue$ t, INSERM U525, 91bd de l’Ho# pital, 75634 Paris Cedex, France. Tel : 33–1-40–77–96–93; Fax: 33-1–40–77–97–28.
E-mail : tregouet!idf.inserm.fr
trait. The major advantage of association studies
lies in their simplicity and their flexibility.
However, it is more and more common to be
interested in testing association in family data,
for example in large-scale samples of sibships
originally collected for linkage analysis (Lind-
painter et al. 1996) or nuclear families collected
for complex segregation-linkage analysis (Villard
et al. 1996). This can be performed by use of the
estimating equations (EE) technique, which
allows one to control for the familial dependency
between individuals (Liang & Zeger, 1986;
Tre! goue$ t et al. 1997; Tre! goue$ t & Tiret, 2000).
The main interests of the EE method are the
absence of distributional assumption, its ro-
bustness to the misspecification of familial corre-
lations, and its flexibility and easiness of use. The
EE technique is now implemented in standard
statistical software programs.
However, even when performed using family
data, association studies are open to the risk of
294 D. A. T
spurious association, due either to inappropriate
choice of controls or to uncontrolled population
stratification. To circumvent this problem, a
class of statistical tests, referred to as ‘ family-
based’ tests, has been proposed among which the
transmissiondisequilibrium test (TDT) is the
most popular (Spielman et al. 1993). These tests
are aimed, in presence of association between a
candidate gene marker and a phenotype, at
determining whether this association is actually
due to linkage and not to other uncontrolled
phenomenon. Initially developed for a binary
phenotype, the TDT has been extended in several
ways (Cleeves et al. 1997; Knapp, 1999; Martin et
al. 1997; Spielman & Ewens, 1998). In particular,
its application to a quantitative phenotype has
received much attention (Abecasis et al. 2000;
Allison, 1997; Allison et al. 1999; Cardon,
2000; George et al. 1999; Monks & Kaplan, 2000;
Rabinowitz, 1997; Rabinowitz & Laird,
2000; Yang et al. 2000; Zhu & Elston, 2000;
Zhu & Elston, 2001).
Because of its theoretical advantages, the TDT
is often advocated as the approach of choice for
the genetic dissection of complex traits (Lander
& Schork, 1994; Risch & Merikangas, 1996).
However, when planning the design of a study,
several elements have to be taken into con-
sideration, in particular the sample size and cost
efficiency of the different approaches. This is all
the more critical since genetic effects underlying
complex phenotypes are expected to be quite
modest and will require large studies to be
detected. Moreover, a crucial requirement of
future genetic studies will be the availability of
new intermediate biological phenotypes, which
are often highly costly. Since a pre-requisite of
the TDT-type methods is the presence of an
association between marker and phenotype, one
may wish to first investigate the association
using a classical association study, and then to
check by a TDT approach whether this as-
sociation is really due to linkage.
The aim of the present study was to provide
some elements for determining the sample size
required for confirming, by a quantitative TDT
approach, an association detected by a classical
association study carried out in a random sample
of sibships. As mentioned above, several quan-
titative TDT-type models are available, but in
this study we focused our attention on two of
them, the model developed by George et al.
(1999) for pedigree data, and the TDT-Q5 model
of Allison (1997) that was initially developed for
families with one child, but extended here to
larger sibships. These models were chosen be-
cause they are based on simple linear regression
models, which allowed us to implement them
into an EE framework with the classical as-
sociation study.
While sample size calculation procedures for
independent subjects are well known, the cor-
responding situation for related individuals is
less developed. Recently, Rochon (1998) has
proposed a general EE procedure for sample size
calculations in correlated data. Since the classical
association approach and the two TDT-type
methods can be incorporated into a common EE
framework, we applied this procedure for cal-
culating sample sizes to the three approaches.
After a description of the method, we compared
the sample sizes required by each method under
different genetic models. An illustration was
finally given with a study of the relationship
between the Ser447Stop polymorphism of the
lipoprotein lipase (LPL) gene and plasma tri-
glyceride (TG) levels in family data.
Notations
Consider a sample of K sibships. Let yk¯
(yk"
,… , yknk
)t (t denotes ‘transposition’) be the
quantitative phenotype vector of the kth sibship
consisting of nk
sibs, with expected mean vector
µk. Assume that y
ki(i¯ 1 to n
k) can be expressed
as yki
¯ ηckiε
kiwhere η is a set of regression
parameters associated with a covariate vector cki
and εki
is a residual random effect. εk¯ (ε
k"
,… ,
εknk
has a multivariate distribution, not necess-
arily known, with mean vector 0 and variance-
covariance matrix Ωk
incorporating the residual
variances and the residual sib–sib correlations.
Sample sizes for quantitative traits 295
We will first describe the statistical models
used in the different approaches for modeling the
means and characterizing the association be-
tween the marker genotype and the phenotype.
Then we will show how the EE can be used for
estimating the regression parameter vector η,
while taking into account the familial depen-
dency between sibs.
Modeling of the means
Classical association analysis
The classical regression model used for testing
association between a quantitative phenotype
and a marker genotype is µki
¯E( yki
r xki)¯
αASSOC
βASSOC
xki
where xki
denotes the co-
variate characterizing the genotype of the ith sib
of the kth sibship. The genotype is measured at a
di-allelic marker locus aA with allele frequency
pa1®p
a.In the most general form, the genotype
of a sib is a set of two indicator variables
corresponding to the genotypes Aa and AA, the
genotype aa being taken as the reference.
However, in our applications, we will only
consider particular models (additive, dominant
and recessive) such that this set of variables
reduces to only one variable. Under an additive
model, xki
¯ 0, 1 or 2 whereas for a dominant or
a recessive model xki
¯ 0 or 1. Note that all sibs
can be used and are then informative for a
classical association analysis, provided that the
residual sib–sib correlation is taken into account.
TDT-type analysis
In the following models, we need to assume
that the parental genotypic information is avail-
able or can be unambiguously deduced from the
offspring’s genotypes.
The TDT-Q5 regression model of Allison
(1997): In this model, the offspring’s phenotype
is regressed on the parental mating type and his
(her) own marker genotype. Only offspring for
which at least one parent is heterozygous at the
marker locus are informative and included in the
analysis. The expected mean is modeled as E( yki
r zk"
, zk#
, xki)¯α
TDT-Aπ
"zk"
π#zk#
βTDT-A
xki
where (zk"
, zk#
) is a set of two dummy variables
characterizing the parental mating type of the
kth sibship. zk"
is equal to 1 if the parental
mating type is Aa¬Aa, 0 otherwise, zk#
is equal
to 1 if the parental mating type is Aa¬AA (or
AA¬Aa) and 0 otherwise, the mating type
aa¬Aa (or Aa¬aa) being considered as the
reference. xki
is defined as above. The initial test
statistic proposed by Allison (1997) was an F
statistic, only valid for families with one child.
However, the model can be easily extended to
several sibs by adequately taking into account
the sib–sib correlation as described in the next
section. The statistic used for testing the marker
effect will then be a Wald test statistic. This
extension of the TDT-Q5 of Allison (1997) will be
referred to as TDT-A.
The TDT-type regression model of George et
al. (1999): This model, as noted here as TDT-G
consists of regressing the offspring’s phenotype
on the parental transmission of the trait-
associated allele A at the marker locus. In this
model, the expected mean is expressed as E( yki
r gki)¯α
TDT-Gβ
TDT-Ggki
where gki
is equal to 1
if the ith sib of the kth sibship received one allele
A from a heterozygous parent and 0 otherwise.
As in the model proposed by Allison (1997), only
offspring for whom at least one parent is
heterozygous at the marker locus are informative
and included in the model. However, in this
model, heterozygous sibs from two heterozygous
parents are not included in the analysis, since in
that case the A allele has been transmitted from
one parent but not transmitted from the other
parent, and therefore are not informative. There-
fore, in this model, the same sibship can include
both informative and non informative sibs. This
is in contrast to the two previous regression
models where any sibship used in the analysis
includes sibs that are all informative. George et
al. (1999) proposed to use a maximum likelihood
(ML) procedure for estimating and testing the
regression parameters, while taking into account
the sib–sib dependency. We propose to use an
EE procedure which has been shown to be as
efficient as the ML procedure (Tre! goue$ t et al.
1997), and has the advantage of making sample
size calculations easier.
296 D. A. T
Estimation of the regression parameters
In order to take into account the familial
dependency between the sibs under the three
statistical models, we propose to estimate the
regression parameters by means of the EE
technique. Following Liang & Zeger (1986), the
regression parameter estimates satisfy the fol-
lowing equations
3K
k="
¥µτ
k
¥ηVar ( y
k)−"( y
k®µ
k)¯ 0, (1)
where Var( yk) is a ‘working’ variance-covariance
matrix (see below).
The solution η# of (1) is such that K"/# (η# ®η) is
asymptotically normally distributed with mean
0 and covariance matrix consistently estimated
by
KW−" 0 3K
k="
¥µτ
k
¥ηVar ( y
k)−" ( y
k®µ
k) ( y
s®µ
k)t
¬Var ( yk)−"
¥µk
¥η 1W−" (2)
where
W¯ ΣK
k="
¥µtk
¥ηVar ( y
k)−"
¥µk
¥η,
the quantity (2) being evaluated at η# .It can be shown (Liang & Zeger, 1986) that the
EE estimates provided by (1) and (2) are
asymptotically unbiased even if the matrix Var
(yk) used is not the true variance-covariance
matrix of εk, i.e. even if the familial dependency
between sibs is not correctly specified. In the
problem considered here, familial correlations
are treated as nuisance parameters that are of
little interest. The EE estimates are said to be
robust to any misspecification of the familial
dependency. The robustness property of the EE
technique also lies in the absence of distributional
assumption for the εk’s. These robustness proper-
ties hold as long as the regression model for µk
is
correct. Liang & Zeger (1986) provided some
examples of incomplete specifications for Var
( yk). In some circumstances, for example in small
samples (K! 50) or when it can be considered
that the chosen specification of Var ( yk) is
correct, it is preferable to use W−" instead of (2)
for estimating the covariance matrix of η# (Pren-
tice & Zhao, 1991; Rotnizsky & Jewell, 1990).
The former, W−", is called the ‘model-based’
variance estimate whereas the latter, formula
(2), is referred to as the ‘robust’ variance
estimate.
In our applications, we assumed for the
specification of Var ( yk) that sibs were equi-
correlated with residual sib–sib correlation ρ and
with constant residual variance σ#.
Sample size calculations based on estimating
equations
Suppose that we wish, prior to the analysis, to
determine the number K of sibships required to
detect the genetic effect at the marker locus for a
given power under one of the three statistical
models. The hypothesis tested is : Ho: η
o¯ 0
versus H": η
o1where η
ois either β
ASSOC, β
TDT-A
or βTDT-G
. In an EE framework, this is generally
performed by means of the Wald test statistic T
¯ η# #ovar(η#
o) (see Rotnitzky & Jewell (1990) for a
discussion on the use of the Wald statistic).
Following Rochon (1998), we will partition the
sample of K sibships into S groups as follows:
each sibship k is assigned to a group s which
gathers all the sibships that have the same
expected mean vector µs, the same matrix of
covariates and the same sibship structure (i.e the
same number of sibs). All sibships within the
same group s will then have the same variance-
covariance matrix Var( ys). Let n
sbe the number
of sibships in group s (3S
s="¯K) and w
sbe the
percentage of sibships in group s relative to the
total number of sibships (ws¯n
sK). Examples
of such partition will be given in the next section.
Following Rochon (1998), the expected EE
estimate that would be a solution of (1) is given
by
η# ¯ 03S
s="
ns
¥µts
¥ηVar ( y
s)−"
¥µs
¥η 1−"
¬03S
s="
¥µts
¥ηVar ( y
s)−"µ
s1. (3)
For a given partition, while the robust vari-
ance (2) of η# would not be known at the design
Sample sizes for quantitative traits 297
Table 1. Partition of a sample of eligible sibships of size 1 for a TDT-A and TDT-G analysis
Parental Expected frequencyExpected mean***
Group s mating type Sib genotype ωs(**) TDT-A TDT-G
1 aa¬Aa aa 2p$ap
AP
Totα
TDT-Aα
TDT-G
2 aa¬Aa Aa 2p$ap
AP
Totα
TDT-Aβ
TDT-Aα
TDT-Gβ
TDT-G
3 Aa¬Aa aa p#ap#
AP
Totα
TDT-Aπ
"α
TDT-G
4(*) Aa¬Aa Aa 2p#ap#
AP
Totα
TDT-Aπ
"β
TDT-A—
5 Aa¬Aa AA p#ap#
AP
Totα
TDT-Aπ
"2β
TDT-Aα
TDT-Gβ
TDT-G
6 Aa¬AA Aa 2pap$
AP
Totα
TDT-Aπ
#β
TDT-Aα
TDT-G
7 Aa¬AA AA 2pap$
AP
Totα
TDT-Aπ
#2β
TDT-Aα
TDT-Gβ
TDT-G
(*) Group 4 is not used in a TDT-G analysis since sibs are not informative.(**) P
Totis the probability of a sib to be informative. It is equal to 4p
ap
A(1®p
ap
A) in a TDT-A analysis and to 2p
ap
A
(2®3pap
A) in a TDT-G analysis.
(***) For TDT-A, an additive model was assumed. For TDT-G, the formula giving the phenotypic mean does notdepend on the genetic model. However, the parameter β
TDT-Gwill not be the same in the different models (see
Table 2).
stage, since it would require data on the unknown
residuals ( yki®µ
ki), the expected model-based
variance estimate can be calculated as
W−"¯ 03S
s="
ns
¥µts
¥ηVar ( y
s)−"
¥µs
¥η 1−"
¯K−" 03S
s="
ws
¥µts
¥ηVar ( y
s)−"
¥µs
¥η 1−"
. (4)
Under H!, T converges asymptotically towards
a central χ# distribution with 1 degree of freedom
(df). Alternatively, under H", the asymptotic
distribution of T is a non-central χ# distribution
with 1 df and a non-central parameter approxi-
mately given by
vEKη# #!va jr(η#
o). (5)
Therefore, given the expected power 1®β and
the level of significance α, one can derive the
value of ν from tables of non-central χ# dis-
tribution or from computing software (for
example, PROC CNONCT in the SAS package
(SAS Institute Inc., Cary, NC)). Given ν, K can
be then determined by solving (5). Alternatively,
given a partition of a sample of K sibships and a
significance level α, the power to detect the
desired effect is given by
1®β¯&+¢
c
g(z, 1, ν) dz,
where c is the critical value, for a significance
level of α, of the central χ# distribution with
1 df, and g is the density function of the non-
central χ# distribution with 1 df and a non-central
parameter ν.
It is important to note that, while the model-
based variance W−" is used in sample size
calculations, the robustness properties of the EE
estimates rely on the use of the robust variance
(2). However, if the investigators have some
information on the true pattern of correlation,
the use of W−" for estimation is valid. In the
following applications, its use is quite reasonable
since sibs can reasonably be assumed to be
equicorrelated and with equal residual variance.
Anyway, as pointed out by Rochon (1998), in
order to maintain robustness, one can still apply
several forms of covariance matrix for Var ( ys),
and then investigate their influence on the
sample size calculations.
Application to the different methods
From now, we will assume that the sibships are
randomly sampled from a population in which
Hardy–Weinberg (HW) equilibrium holds.
Classical association analysis
Suppose that all K sibships are of size 1. Every
sibship can be assigned to one of the S¯ 3 groups
defined by the three possible genotypes aa, Aa
and AA of a sib. The expected frequencies ωsof
298
D.A
.T
Table 2. Analytical expression of the expected regression parameter estimate and of its expected variance obtained from a classical association, a
TDT-A and a TDT-G analysis, under an additive or a dominant model, in sibships of fixed size
Variance
Parameter Sibships of size 1 Sibships of size 2 Sibships of size 3
Additive modelAssociation )βASSOC)¯A h#σ#
2(1®h#)pap
A
σ#
2pap
AK
σ#(1®ρ#)
2pap
AK(2®ρ)
σ#(1®ρ) (12ρ)
6pap
AK
TDT-A βTDT-A
¯βASSOC 4(1®p
ap
A)σ#
K
2(1®pap
A)σ# (1®ρ#)
K
4(1®pap
A)σ# (1®ρ) (12ρ)
3K(1ρ)
TDT-G*β
TDT-G¯β
ASSOC
(2®2pap
A)
(2®3pap
A)
4σ#
K
2σ#(1®ρ#) (4®5pap
A)
K(4®6pap
A®p
ap
Aρ#)
2σ#(1®ρ#) (12ρ) (8®9pap
A)
3K(4(1ρ)#®pAp
A(612ρ8ρ#ρ$))
Dominant modelAssociation )βASSOC)¯A h#σ#
(1®h#)p#a(1®p#
a)
σ#
p#a(1®p#
a)K
σ#(1®ρ#)
2p#a(1®p#
a)K01®ρ
(1-3pa)
(44pa)1
σ#(1®ρ) (12ρ)
3p#a(1®p#
a)K01ρ
(1®pa)
(1pa)1
TDT-A βTDT-A
¯βASSOC 16(1®p
ap
A)σ#
Kpa(3p
a)
8(1®pap
A)σ#(1®ρ#)
Kpa(3p
a)
16(1®pap
A)σ#(1®ρ) (12ρ)
3Kpa(3p
a) (1ρ)
TDT-G*β
TDT-G¯β
ASSOC
pa(1p
a)
(2®3pap
A)
4σ#
K
2σ#(1®ρ#) (4®5pap
A)
K(4®6pap
A®p
ap
Aρ#)
2σ#(1®ρ#) (12ρ) (8®9pap
A)
3K(4(1ρ)#®pap
A(612ρ8ρ#ρ$))
K is the number of eligible sibships. pA¯ 1®p
a. h# is the heritability associated to the marker. σ# is the residual variance and ρ the residual sib-sib correlation.
* Note that, unlike the other two methods, the analytical expression for the regression parameter estimated by the TDT-G method depends on ρ. The formula givenin this table is that obtained when ρ¯ 0.
Sample sizes for quantitative traits 299
these groups are therefore p#a, 2p
a(1®p
a) and (1®
pa)#. The expected means µ
sin these three groups
are µaa
, µAa
and µAA
. Similarly, a sample of K
sibships of size 2 where the 2 sibs play a
symmetric role can be partitioned into 6 sub-
groups corresponding to the 6 possible genotypic
vectors of the sib pair (aa, aa), (aa, Aa), (aa, AA),
(Aa, Aa), (Aa, AA) and (AA, AA). The number of
groups in which a sample of sibships of size 3 can
be partitioned is 15, and increases with the
sibship size.
TDT-A analysis
Unlike the classical association analysis where
partitioning depends solely on the sib’s genotypic
vector, we must now take into account the
distribution of the two additional covariates
characterizing the parental mating type. A
sample of K eligible sibships (sibships that
include informative sibs) of size 1 must now be
partitioned into S¯ 7 groups according to the
parental mating type and the sib’s genotype as
described in Table 1. Table 1 also provides the
expected frequency ωs
as well as the expected
mean for an additive allele effect. Similarly, it
can be shown that any sibship of size 2 (resp. size
3) can be assigned to one of 12 (resp. 18) possible
groups.
TDT-G analysis
Under a TDT-G model, a sample of K eligible
sibships of size 1 must be partitioned into S¯ 6
groups as depicted in Table 1. These groups are
the same as those defined for a TDT-A analysis,
except for the group which includes heterozygous
sibs of two heterozygous parents, which are not
informative in a TDT-G model. For sibships of
size 2, the number of groups becomes 11 instead
of 12 under the TDT-A model. However,
amongst these 11 groups, 2 groups include
sibships with only one informative sib. For
sibship of size 3, the number of groups is 17 (12
with 3 informative sibs, 3 with 2 informative sibs
and 2 with 1 informative sib).
Up to now, we have been interested in samples
of sibships of fixed size. It can be easily seen that
the partitions proposed above can also apply to
sibships of varying size. All the ωs
would have
to be multiplied by appropriate weights corre-
sponding to the proportions of the different kinds
of sibships in the whole sample.
For sibships of fixed size, values of the
expected regression parameter estimates (i.e.
β#ASSOC
, β#TDT-A
, β#TDT-G
) and of their variances
can be obtained by solving analytically (1) and
(2). These values are given in Table 2 for sibships
of size 1, 2 and 3, under an additive and a
dominant model. Formulae for a recessive model
are identical to those obtained for a dominant
model with pa
replaced by 1®pa. Note that sol-
ving (1) indicates that the expected estimates
of the parental mating type coefficients in the
TDT-A model are null under random mating
and HW equilibrium. Therefore, the expected
estimate of the genotype effect obtained under
the TDT-A analysis is the same as that provided
by the classical association analysis. However,
these two estimates differ by their variance.
For sibships of varying size, corresponding ex-
pressions are no longer straightforward.
Note finally that the assumption of Hardy–
Weinberg equilibrium could be relaxed. This
would require to know the true genotypic
distribution in the population from which the
sample is drawn and to make the ωs
dependent
on this distribution.
Sample size calculations under the different
methods
The sample size, which can be obtained by use
of the mathematical expressions given in Table
2, is a function of the marker heritability, the
marker allele frequency pa
and the residual
sib–sib correlation ρ. In a classical association
analysis and in a TDT-A analysis, the number of
informative sibs is n times the number of eligible
sibships of size n, since all sibs are informative.
This is not true for a TDT-G analysis (see above).
We considered a di-allelic marker associated
with a heritability of 5% and having either an
additive or a dominant effect. Results for a
recessive effect are similar to those obtained for
300 D. A. T
TDT-G
Fig. 1. Number of informative sibs required to detect, in the different models, the effect of an additivemarker associated with an heritability of 5%, according to the residual sib–sib correlation. Power¯ 0.90,significance level α¯ 0.05, allele frequency of the marker: p
a¯ 0.2.
a dominant model when pa
is replaced by 1®pa.
The power and the significance level were taken
to be 0.90 and 0.05, respectively.
Additive effect
Figure 1 shows the influence of the sibship size
and the residual sib–sib correlation on the
number of informative sibs required to detect the
additive effect of a di-allelic marker with allele
frequency 0.2. The number of informative sibs
required in a classical association framework
follows a symmetric f-shaped curve according
to the sib–sib correlation. Analytical calculations
indicated that, whatever the allele frequency,
the maximum occurs for ρ¯ 2®o 3 for sibships
of size 2 and 0.25 for sibships of size 3, and the
minimum for ρ¯ 0 or ρ¯ 0.5. Besides, the
number of informative sibs required increases
with the sibship size. Under an additive model, it
is then more powerful to carry out a classical
Sample sizes for quantitative traits 301
Fig. 2. Number of informative sibs required to detect, for different sibship sizes, the effect of an additivemarker associated with an heritability of 5%, according to the marker allele frequency. Power¯ 0.90,significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.
association study with small sibships, the most
powerful design being to include independent
subjects.
Conversely, the number of informative sibs
required in a TDT-type analysis decreases with
both the sibship size and the residual sib–sib
correlation, indicating that the stronger the
familial clustering, the more powerful the TDT-
type analysis. Note that, for an additive model
and an allele frequency pa¯ 0.2, the number of
informative sibs required by the TDT-A (resp,
TDT-G) method is lower than that of the
association method as soon as ρ" 0.14 (resp,
" 0.10) for sibships of size 2 and ρ" 0.07
(resp," 0.05) for sibships of size 3.
Figure 2 shows the influence of the marker
302 D. A. T
Fig. 3. Number of informative sibs required to detect, in the classical association and the TDT-A models,the effect of a dominant marker associated with an heritability of 5%, according to the residual sib–sibcorrelation. The TDT-G model is not represented because it generally requires a much higher number. Power¯ 0.90, significance level α¯ 0.05, allele frequency of the marker: p
a¯ 0.2.
allele frequency on the number of informative
sibs required by each method when the residual
sib–sib correlation is set to 0.2. In an association
study, the number of informative sibs does not
depend on the allele frequency when the allele
effect is additive, but slightly increases with the
sibship size. When the residual sib–sib correlation
is set to 0 or to 0.5, the number of informative
sibs no longer depends on the allele frequency, as
already shown in Fig. 1.
By contrast, the number of informative sibs
required under the TDT-type methods are
strongly influenced by the marker allele fre-
quency. The number of informative sibs required
by both methods follow a f-shaped curve with a
maximum achieved for pa¯ 0.5, regardless of
sibship size. However, as the sibship size
increases, the allele frequency interval in which
the TDT-A and TDT-G analyses require more
informative sibs than the classical association
analysis narrows. With sibships of size 1, the
TDT-A analysis is more demanding than the
association analysis when pa
lies between
0.18 and 0.82. For ρ¯ 0.2, this range becomes
0.22–0.78 and 0.25®0.75 for sibships of size 2
and 3, respectively. The corresponding ranges
for the TDT-G method are 0.19®0.81, 0.23®0.77
and 0.26®0.74, respectively. The allele fre-
quency interval in which the TDT-A and TDT-G
analyses require more informative sibs than
the classical association analysis also narrows
when the sib–sib correlation increases, as already
reflected in Fig. 1. For example, when ρ¯ 0.5,
the TDT-type methods are less demanding in
terms of informative sibs than the classical
association analysis, except in a small range of
allele frequency around 0.5 for sibships of size
2. Both TDT-type methods have comparable
power, the TDT-G method requiring slightly
fewer informative sibs than the TDT-A method
to detect the additive effect of marker (Fig. 1),
except for very low allele frequency (pa! 0.04)
and a strong clustering effect (e.g. sibships of
size 3 and ρ" 0±3).
Dominant effect
Figures 3 and 4 present similar results for a
dominant model. The number of informative
sibsrequired by a classical association study now
follows an asymmetric f-shaped curve accord-
Sample sizes for quantitative traits 303
Fig. 4. Number of informative sibs required to detect, for different sibship sizes, the effect of a dominantmarker associated with an heritability of 5%, according to the marker allele frequency. Power¯ 0.90,significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.
ing to the residual sib–sib correlation, with a
unique minimum occurring at ρ¯ 0.5. For
pa¯ 0.2, it is more powerful to work on sibships
than on independent individuals as soon as
ρ" 0.34. This value is 0.42 for pa¯ 0.5.
For a TDT-A analysis, the influence of the
residual sib–sib correlation and of the sibship
size on the number of required informative sibs
followed the same pattern as that observed for
an additive effect. Unlike the additive case
304 D. A. T
where both TDT-A and TDT-G methods had
comparable powers, under a dominant model, in
most situations the TDT-G method requires a
very large number of informative sibs, and for
this reason, is not represented in Fig. 3. As an
indication, for detecting the dominant effect of a
marker having an allele frequency pa¯ 0.2 and
associated with a heritability of 5%, the TDT-G
method requires 1230 informative sibs when sib-
ships are of size 1. For sibships of size 2 and 3
with residual sib–sib correlation of 0.5 (the most
favorable situation), the corresponding numbers
become 1084 and 981, respectively.
Figure 4 shows the influence of the allele
frequency on the number of informative sibs
required by each method in a dominant model.
The number of informative sibs required by a
classical association analysis, for a given sibship
size and whatever the residual sib–sib correlation,
slightly and linearly increases with the allele
frequency, except for sibships of size 1 where
itremains constant. The number of informative
sibs required by a TDT-A analysis follows a f-
shaped curve very similar to that observed for
an additive model. The maximum now occurs at
paC 0.57, instead of 0.5 in the additive model,
whatever the sibship size and the residual sib–sib
correlation. As indicated in Fig. 4, a TDT-A
analysis would require less informative sibs than
an association analysis to detect a dominant
effect, as soon as pa
lies outside [0.28®0.80] for
sibships of size 1. This interval becomes
[0.32®0.77] and [0.36®0.75] when dealing with
sibships of size 2 and 3, respectively. Data not
shown also indicate that when ρ¯ 0.5 and the
sibship size is larger than 1, a TDT-A analysis
always requires less informative sibs than a
classical association analysis. Lastly, the number
of informative sibs required in a TDT-G analysis
is, whatever the sibship size and the residual
correlation, a decreasing function of the allele
frequency. Besides, the number of informative
sibs required slightly decreases as the sibship size
and the residual sib–sib correlation increase.
When the allele frequency exceedsC 0.86, the
TDT-G analysis is less demanding than a classical
association and close to the TDT-A analysis,
whatever the sibship size and the residual sib–sib
correlation. However, as soon as the allele
frequency is lower than 0.7, the opposite is
observed, the difference between methods being
extremely important for low allele frequencies.
Cost efficiency of the different methods
In the previous section, the calculations fo-
cused on the number of informative sibs, i.e. the
number of sibs really used in the analysis. While
all sibs are used in a classical association analysis,
in TDT-type methods only sibs from mating
types including at least one heterozygous parent
are used. This implies that a larger number of
genotypes andor phenotypes need to be de-
termined than the number really used. It is then
important to compare the different methods in
terms of cost efficiency. As described by Allison
(1997), there are two kinds of screening pro-
cedures in TDT-type studies. The first one,
referred to as the genotype-first strategy, consists
of (1) genotyping a sample of parents, (2)
selecting sibships whose parental mating type is
eligible and (3) genotyping and phenotyping sibs
from eligible sibships. The second procedure,
referred to as the phenotype-first strategy,
consists of (1) phenotyping a sample of sibships,
(2) genotyping parents in order to select eligible
sibships and (3) genotyping sibs from eligible
sibships. When studying the genetic component
of a multifactorial phenotype, several candidate
genes are generally investigated and the
phenotype-first strategy seems the most ap-
propriate since informative sibs at a given locus
may not be informative at another locus. There-
fore, we will now present a study of the total
number of sibs to be phenotyped when applying
each of the methods previously described under a
phenotype-first strategy. This number can be
again derived from the mathematical expressions
given in Table 2.
Additive effect
The influence of the sibship size and of the
allele frequency on the total number of sibs to be
phenotyped to detect the additive effect of a di-
Sample sizes for quantitative traits 305
Fig. 5. Total number of phenotypes to be measured in a phenotype-first strategy to detect, for differentsibship sizes, the effect of an additive marker associated with an heritability of 5%, according to the markerallele frequency. Power¯ 0.90, significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.
allelic marker when ρ¯ 0.2 is presented in Fig. 5.
In a TDT-G analysis, the total number of sibs to
be phenotyped follows a symmetric f-shaped
curve according to the allele frequency while, in
classical association and TDT-A analyses, it does
not depend on the allele frequency. In contrast
with what was observed in Fig. 2, where the
number of informative sibs could be lower in
306 D. A. T
Fig. 6. Total number of phenotypes to be measured in a phenotype-first strategy to detect, for differentsibship sizes, the effect of a dominant marker associated with an heritability of 5%, according to the markerallele frequency. Power¯ 0.90, significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.
TDT-type analyses than in a classical association
study for extreme allele frequencies, the total
number of sibs to be phenotyped is now always
higher in TDT-type analyses. For both TDT-
type methods, the number of phenotypes, how-
ever, decreases as the sibship size or the sib-sib
correlation increases, the influence of the sibship
size being more marked for high than for
moderate sib–sib correlation. Differences
between the three methods tend then to decrease
as the clustering of the data becomes more
marked.
Sample sizes for quantitative traits 307
Table 3. Mean (Standard Deviation) of trigly-
ceride levels according to LPLS447X genotype
in sibs
LPLS447X genotype
SS SX XX R#*
All N¯ 559 N¯ 166 N¯ 160.84 0.70 0.85 2.2%
(0.37) (0.30) (0.53) p! 10−%
Boys N¯ 287 N¯ 78 N¯ 70.84 0.63 1.03 4.2%
(0.39) (0.26) (0.69) p! 10−%
Girls N¯ 272 N¯ 88 N¯ 90.84 0.77 0.71 1.4%
(0.35) (0.31) (0.34) p¯ 0.04
* Test was performed by use of the EE technique onlog-transformed values, adjusted on age, gender and oralcontraception (when appropriate) assuming a dominanteffect of the LPLX447 allele. Untransformed values areshown.
Dominant effect
Again, the TDT-type methods are always more
demanding in terms of phenotypes than the
classical association study. The difference be-
tween the TDT-A method and the association
method increases linearly with the allele fre-
quency. As already stressed in the previous
section concerning power, the TDT-G method
required many more phenotypes than the two
other methods, except for very high allele
frequencies.
A statistical program in C language and a
‘ interactive-web’ tool relying on formulae of
Table 2 have been developed for calculating the
power, the minimum sample size and the cost of
a study for each statistical method developed in
this paper. Both the phenotype-first and the
genotype-first strategies were envisaged,
although only the results of the former one have
been presented here. These programs are avail-
able upon request from the authors.
Illustration
We illustrated the power calculations pre-
sented above by an example on the relationship
between the lipoprotein lipase (LPL) gene and
triglyceride (TG) levels. LPL is a key enzyme
involved in the metabolism of TG-rich lipo-
proteins, making the LPL gene a candidate for
the development of atherosclerosis. Several poly-
morphisms of the LPL gene have been described
(Gagne! et al. 1994; Nickerson et al. 1998; Wilson
et al. 1993). In particular, a Serine447Stop
(S447X) mutation leading to a premature stop
codon has been shown to affect TG levels, the
stop allele being consistently associated with
lower TG levels (Gagne! et al. 1996; Garenc et al.
2000; Humphries et al. 1998; Jemaa et al. 1995;
Zhang et al. 1995). We investigated the re-
lationship between the S447X polymorphism
and TG levels in a sample of 513 sibships selected
from healthy nuclear families who had vol-
unteered for a free health examination as part of
the STANISLAS Cohort (Siest et al. 1997).
Sibships with all sibs aged& 15 years were
selected, leading to a sample composed of 302
sibships of size 1, 194 sibships of size 2 and 17
sibships of size 3 (N¯ 741). The TG distribution
was adjusted for age, sex and oral contraception
in girls, and log-transformed to remove positive
skewness. Parents and sibs were genotyped for
the S447X polymorphism.
The mean age (³SD) of sibs was 17.5 (³ 2.5)
years and the mean TG level was 0.81 (³0.36) mmoll. The S447X genotype distribution
in parents was compatible with HW expectations
and the S447 allele frequency was estimated as
0.88³0.01. Results from the classical association
analysis based on the EE technique are reported
in Table 3. Due to their low number, homo-
zygotes for the X447 allele were pooled with
heterozygotes. Assuming a dominant model, TG
levels were significantly decreased in carriers of
the X447 allele (p! 10−%), the S447X poly-
morphism explaining 2.2% (h#) of the TG varia-
bility after adjustment for age, sex and oral
contraception in girls. The association was highly
significant in boys (h#¯ 4.2%, p! 10−%) and
borderline in girls (h#¯ 1.4%, p¯ 0.04). The
genotype¬gender interaction was borderline
(p¯ 0.06). After controlling for the S447X effect,
the residual common sib–sib TG correlation,
estimated by the EE technique (Tre! goue$ t et al.
1999), was 0.20 (95%CI [0.07®0.32] ; p¯ 0.002).
308 D. A. T
Table 4. Minimum number of sibships and informative sibs required to detect, with a 0.90 power, the
dominant effect of the LPLX447 allele according to the statistical method. The sibship size
distribution is the same as that observed in the sample (significance level : a¯ 0.05)
Total number ofsibships to be sampled
Number of eligiblesibships
Number ofinformative sibs
All(ρ¯ 0.20; h#¯ 2.2%)
Classical Association 337 337 485TDT-A 613 231 333TDT-G 644 233 330
Boys(ρ¯ 0.23; h#¯ 4.2%)
Classical Association 201 201 245TDT-A 373 141 172TDT-G 392 141 170
Girls(ρ¯ 0.19; h#¯ 1.4%)
Classical Association 619 619 756TDT-A 1157 437 534TDT-G 1215 437 528
ρ¯ residual correlation; h#¯heritability associated to the marker with allele frequency of 0.88.
Table 5. Parameter estimates obtained by classical association and TDT-type analyses of the
relationship between TG levels and the LPLS447X polymorphism, assuming a dominant model
All Boys Girls
Classical Association Analysis N¯ 741 N¯ 372 N¯ 369α#
ASSOC(SE) ®0.578 (0.124) ®0.675 (0.187) ®0.422 (0.159)
β#ASSOC
(SE) ®0.157 (0.035) ®0.223 (0.052) ®0.093 (0.044)Test of genetic effect p! 10−% p! 10−% p¯ 0.04
TDT-A Analysis N¯ 286 N¯ 137 N¯ 149α#
TDT-A(SE) ®0.511 (0.206) ®0.746 (0.337) ®0.297 (0.252)
π#"(SE) 0.055 (0.082) 0.179 (0.143) ®0.043 (0.096)π##
(SE) 0.010 (0.121) 0.121 (0.179) ®0.151 (0.084)β#
TDT-A(SE) ®0.213 (0.051) ®0.311 (0.078) ®0.107 (0.065)
Test of genetic effect p! 10−% p! 10−% p¯ 0.10
TDT-G Analysis N¯ 267 N¯ 131 N¯ 136α#
TDT-G(SE) ®0.489 (0.214) ®0.728 (0.372) ®0.296 (0.249)
β#TDT-G
(SE) ®0.206 (0.052) ®0.292 (0.083) ®0.114 (0.063)Test of genetic effect p! 10−% p! 10−$ p¯ 0.07
Analyses were performed on log-triglycerides levels adjusted on age, gender and oral contraception in girls, by useof the EE technique.
N is the number of informative sibs used in each analysis.
The brother–brother and sister–sister residual
correlations were 0.23 and 0.19, respectively.
As indicated in Table 4, more than 750 girls
within at least 619 sibships would be required to
detect such an effect in a classical association
analysis with a 0.90 power. More than 1100
sibships would have to be collected to detect such
an effect by one of the TDT-type methods, even
though the actual number of subjects on whom
the analysis would be performed is lower than for
the association analysis. As shown in Table 5,
both TDT-type methods found a significant
effect in the whole sample and in boys, but not in
girls. Parameters estimated by the three analyses
are given in Table 5. Although expected values of
β#ASSOC
and β#TDT-A
should be identical, the
observed values differed slightly due to the fact
that βTDT-A
was estimated in a subset of the total
sample. Note finally that the parental mating
type coefficients of the TDT-A analysis, π#"
and
π##, were not significantly different from 0, as
expected in a population in HW equilibrium.
Sample sizes for quantitative traits 309
Unlike association studies which are submitted
to bias due to uncontrolled stratification, TDT-
type studies allow one to definitively conclude
that association is due to linkage and not to
other phenomenom. Ideally, one should then
always perform a TDT rather than an association
study. However, TDT-type studies are more
demanding than association studies in terms of
genotyping andor phenotyping, and are often
more complex to set up. Therefore, before
embarking in a TDT-type study, it is important
to evaluate the effect and the cost of this
approach.
During the last decades, a large amount of
research has been directed to the development of
TDT-type methods for quantitative traits
(Abecasis et al. 2000; Allison 1997; Allison et al.
1999; Cardon 2000; George et al. 1999; Martin et
al. 2000; Monks & Kaplan, 2000; Rabinowitz,
1997; Rabinowitz & Laird, 2000; Yang et al.
2000; Zhu & Elston, 2000; Zhu & Elston, 2001).
Our intention in this paper was not to compare
the power of all these methods, but just to
investigate in which conditions it would be
possible to confirm by means of a TDT approach
the effect of a marker on a quantitative trait
found in an association study. Therefore, we
focused our interest on two TDT-type methods
formulated through simple regression models
allowing one to compute analytically sample
sizepower by the EE procedure. One of these
TDT methods is that described by George et al.
(1999), originally based on an ML approach, but
for which we proposed a more flexible EE
approach. The other one is an extension of the
model proposed by Allison (1997), initially
restricted to one child per family, but extended
here to several sibs using the EE technique. Both
methods are then valid for testing allelic as-
sociation since they correctly take into account
the familial dependency between sibs. Note that
these two TDT-type methods are not necessarily
the most powerful ones; in particular an alterna-
tive TDT-G method (Zhu & Elston, 2001) has
been recently proposed which appears to be more
powerful than the original one. However,
expressed in an EE framework, these regression-
based methods have the great advantage of
being easily implemented in standard statistical
packages such as Proc Genmod in SAS (SAS
Institute Inc., Cary, N. C.), making our sample
sizepower calculation procedure of wide use.
The figures provided in this paper should then be
interpreted as orders of magnitude for the sample
sizes required in a TDT approach.
Some general conclusions can be drawn from
our calculations. Under an additive model, the
power of a classical association study is
maximized when the analysis is performed on
unrelated individuals. Conversely, the power of
the TDT-type methods increases both with the
sibship size and the residual sib–sib correlation.
This result is in agreement with recent studies
which indicated that switching from sib pairs to
larger sibships increases the power of a quan-
titative TDT-type analysis (Allison et al. 1999;
George et al. 1999; Monks & Kaplan, 2000),
especially when the residual sib–sib correlation is
high (Allison et al. 1999). For a given heritability,
the power of an association study does not
depend on the marker allele frequency, unlike
the TDT-type methods which have a maximum
power for extreme allele frequencies. Classical
association studies appear to require less in-
formative sibs than TDT-type studies when the
allele frequency is close to 0.5, while the opposite
is observed for high or low allele frequencies,
especially when the sibship clustering is strong.
The situation is slightly different under a domi-
nant (or a recessive) model. The first difference is
that, unlike for the additive case, the power of an
association analysis is not always maximum in
unrelated individuals, but can be increased by
working on sibships if there is a high sib–sib
correlation. Moreover, the power slightly
decreases with the allele frequency. The second
difference is that except for high allele
frequencies (low for recessive models) the number
of informative sibs required by the TDT-G
method is considerably higher than that required
by the two other methods. Again, a smaller
number of informative sibs is needed in the
310 D. A. T
classical association analysis than in the TDT-A
method for allele frequencies close to 0.5, the
opposite being observed for extreme allele
frequencies. It can be also deduced from Table 2
that, for a given heritability, the sample size
required by both a classical association analysis
and a TDT-A analysis is higher or equal for
detecting an additive effect than a dominant or a
recessive effect. Therefore, when planning to
study the relationship between a marker and a
quantitative phenotype in sibship data, it is
more conservative to determine the sample size
by assuming an additive model, except if another
genetic model can be predicted from previous
studies.
Sample size calculations were based on the
minimum number of sibs required in the stat-
istical analysis (i.e. informative sibs), and not on
the number of sibs initially collected to find the
appropriate number of informative sibs. As
mentioned before, two kinds of screening pro-
cedures can be used to collect data: the genotype-
first screening strategy which consists of geno-
typing a sample of parents from which sibs from
eligible sibships are genotyped and phenotyped,
and the phenotype-first strategy where a sample
of sibs is initially phenotyped and the parents are
then genotyped in order to select informative
sibs. We have shown that, when the phenotype-
first strategy is chosen, the TDT-G method
always requires more sibs to be collected than
the TDT-A method, which itself requires more
sibs than a classical association analysis. How-
ever, the differences between methods tend to
decrease as the sibship clustering increases. As an
example, for a frequent allele and a moderate
correlation, the TDT methods would be 1.5 to 2-
fold more demanding in terms of sample size
than a classical association study.
A similar analysis could have been done with
the genotype-first strategy. The procedure de-
veloped in this paper can be used to help the
investigators to choose the most efficient strat-
egy, which clearly depends on the phenotyping
and genotyping costs.
Note that our procedure is based on the
generalized Wald statistic as described by
Rochon (1998). Liu & Liang (1997) have
proposed the use of the quasi-score statistic for
calculating sample size on correlated data. Their
procedure was then applied to the detection of
familial aggregation in a case-control family
design. We applied this quasi-score procedure to
classical association studies and compared it to
that described in this paper. Both procedures
yielded similar results (data not shown). All the
results presented in the present paper were
obtained using the ‘model-based’ variance EE
estimate assuming an equicorrelation structure
between sibs, and not the ‘robust’ variance
estimate. This means that the sample size
calculations given in this paper are exactly those
obtained from a ML analysis using a multi-
normal distribution in equicorrelated sibs (Zhao
et al. 1992). Several studies have shown (Zhao et al.
1992; Feng et al. 1996; Tre! goue$ t et al.1997) that
the power of the ML method and the EE method
using the ‘robust’ variance estimate are gen-
erally similar, suggesting that the sample size
calculations described here are expected to be
valid also when the statistical analysis is per-
formed using the ‘robust’ EE technique. Finally,
since no distributional assumption is required in
the EE procedure, its application to a binary
phenotype is straightforward, by use of a logistic
model instead of a linear one (George et al. 1999;
Tre! goue$ t & Tiret, 2000).
Several limitations have to be addressed.
Firstly, the proposed EE procedure assumes that
sibships are randomly sampled. In the case where
sibships are collected through specific patterns of
trait values, the proposed procedure would not
provide valid results. Secondly, although this
procedure could be applied to more complex
family data such as large pedigrees, analytical
calculations would not be straightforward. Simi-
larly, extensions to multiallelic marker and to
the detection of gene¬environment and
gene¬gene interactions, although possible,
would be quite demanding to implement.
For illustrative purposes, we applied our EE
procedure to investigate the relationship between
the S447X polymorphism of the LPL gene and
plasma TG levels in a sample of sibships. It was
Sample sizes for quantitative traits 311
shown, by classical association analysis, that the
X447 allele was associated with lower TG levels,
as previously reported. Our sample was large
enough to confirm, by both TDT-type methods,
that the association was actually due to linkage
and not to uncontrolled stratification. The fact
that the TDT tests failed to reach significance in
girls was due to a lower heritability than for
boys.
In conclusion, we propose a flexible EE
procedure for determining sample size in classical
association and TDT-type analyses using family
data. It can be used as a preliminary step before
collecting phenotypic and genotypic data when
investigating the role of a candidate gene in the
etiology of a quantitative phenotype.
The Stanislas cohort is supported regularly byBeckman Instruments (U. S.),Biome! rieux (France),Johnson and Johnson (France), Merck (France), andRoche (U.S.). The authors thank the reviewers forproviding helpful comments on the earlier draft of thismanuscript.
Abecasis, G., Cardon, L. & Cookson, W. (2000). A generaltest of association for quantitative traits in nuclearfamilies. Am. J. Hum. Genet. 66, 279–292.
Allison, D. B. (1997). Transmission-disequilibrium testsfor quantitative traits. Am. J. Hum. Genet. 60,676–690.
Allison, D., Heo, M., Kaplan, N. & Martin, E. (1999).Sibling-based tests of linkage and association forquantitative traits. Am. J. Hum. Genet. 64, 1754–1764.
Cardon L. R. (2000). A sib-pair regression model oflinkage disequilibrium for quantitative traits. Hum.Hered. 50, 350–358.
Cleeves, M., Olson, J. & Jacobs, K. (1997). Exacttransmission-disequilibrium tests with multiallelicmarkers. Genet. Epidemiol. 14, 337–347.
Collins, F., Guyer, M. & Chakravarti, A. (1997).Variations on a theme: cataloging human DNAsequence variation. Science 278, 1580–1581.
Feng, Z., McLerran, D., & Grizzle, J. (1996). A com-parison of statistical methods for clustered dataanalysis with gaussian error. Stat. Med. 15, 1793–1806.
Gagne! , E., Genest, J., Zhang, H., Clarke, L. &Hayden,M. (1994). Analysis of DNA changes in the LPL genein patients with familial combined hyperlipidemia.Arterioscler. Thromb. 14, 1250–1257.
Gagne! , S. E., Larson, M. G., Pimstone, S. N., Schaefer,E. J., Kastelein, J. J., Wilson, P. W., Ordovas, J. M. &Hayden, M. R. (1999). A common truncation variantof lipoprotein lipase (Ser447X) confers protectionagainst coronary heart disease : the FraminghamOffspring Study. Clin Genet. 55, 450–454.
Garenc, C., Perusse, L., Gagnon, J., Chagnon, Y.,Bergeron, J., Despres, J., Province, M., Leon,A.,Skinner, J., Wilmore, J., Rao, D. & Bouchard, C.(2000). Linkage and association studies of the lipo-protein lipase gene with postheparin plasma lipaseactivities, body fat, and plasmalipid and lipoproteinconcentrations: the HERITAGE Family Study.Metabolism 49, 432–439.
George, V., Tiwari, H. K., Zhu, X. & Elston, R. C.(1999). A test of transmissiondisequilibrium forquantitative traits in pedigree data, by multipleregression. Am. J. Hum. Genet. 65, 236–245.
Humphries, S. E, Nicaud V., Margalef, J., Tiret, L.,Talmud, P. J., for the EARS. (1998). Lipoproteinlipase gene variation is associated with a paternalhistory of premature coronary artery disease andfasting and postprandial plasma triglycerides. TheEuropean Atherosclerosis Research Study (EARS).Arterioscler. Thromb. Vasc. Biol. 18, 526–534.
Jemaa, R., Fumeron, F., Poirier, O., Lecerf, L., Evans,A., Arveiler, D., Luc, G., Cambou, J., Bard, J.,Fruchart, J., Apfelbaum, M., Cambien, F. & Tiret, L.(1995). Lipoprotein lipase gene polymorphisms:associations with myocardial infarction and lipopro-tein levels, the ECTIM study. J. Lipid Res. 36,2141–2146.
Knapp, M. (1999). The transmissiondisequilibrium test(TDT) and parental genotype reconstruction: thereconstruction-combined transmissiondisequilibriumtest. Am. J. Hum. Genet. 64, 961–870.
Lander, E. (1996). The new genomics: global views ofbiology. Science 274, 536–539.
Lander, E. S., Schork, N. J. (1994). Genetic dissection ofcomplex traits. Science 265, 2037–2048.
Liang, K. Y. & Zeger, S. L. (1986). Longitudinal dataanalysis using generalized linear models. Biometrika73, 13–22.
Lindpaintner, K., Lee, K., Larson, M. G., Rao,V. S.,Pfeffer, M. A., Ordovas, J. M., Schaefer, E. J. et al.(1996). Absence of association or genetic linkagebetween the angiotensin-converting-enzyme gene andleft ventricular mass. N. Engl. J. Med. 334, 1023–1028.
Liu, G. & Liang, K. Y. (1997). Sample size calculationsfor studies with correlated observations. Biometrics.53, 937–947.
Martin, E., Kaplan, N. & Weir, B. (1997). Tests forlinkage and association in nuclear families. Am. J.Hum. Genet. 61, 439–448.
Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N.L. (2000). A test for linkage and association in generalpedigrees : the pedigree disequilibrium test. Am. J.Hum. Genet. 67, 146–154.
Monks, S. & Kaplan, N. (2000). Removing the samplingrestriction from family-based tests of association for aquantitative-trait locus. Am. J. Hum. Genet. 66,576–592.
Nickerson, D. A., Taylor, S. L., Weiss, K. M., Clark, A.G., Hutchinson, R. G., Stengard, J., Salomaa, V.,Vartiainen, E., Boerwinkle, E. & Sing, C. F. (1998).DNA sequence diversity in a 9.7-kb region of thehuman lipoprotein lipase gene. Nat. Genet. 19, 233–240.
Prentice, R. L. & Zhao, L. P. (1991). Estimatingequations for parameters in means and covariances ofmultivariate discrete and continuous responses. Bio-metrics. 47, 825–839.
312 D. A. T
Rabinowitz, D. (1997). A transmission disequilibriumtest for quantitative trait loci. Hum. Hered. 47,342–350.
Rabinowitz, D. & Laird, N. (2000). A unified approach toadjusting association tests for population admixturewith arbitrary pedigree structure and arbitrary missingmarker information. Hum. Hered. 50, 211–223.
Risch, N. & Merikangas, K. (1996). The future of geneticsstudies of complex diseases. Science 273, 1516–1517.
Rochon, J. (1998). Application of GEE procedures forsample size calculations in repeated measures experi-ments. Stat Med. 17, 1643–1658.
Rotnitzky, A. & Jewell, N. P. (1990). Hypothesis testingof regression parameters in semiparametric generalizedlinear models for cluster correlated data. Biometrika.77, 485–497.
Schaid, D. J. & Sommer, S. S. (1994). Comparison ofstatistics for candidate-gene association studies usingcases and parents. Am. J. Hum. Genet. 55, 402–409.
Siest, G., Lecomte, E., Visvikis, S., Herbeth, B.,Gueguen, R., Vincent-Viry, M., Steinmetz, J., Beaud,B., Locuty, J. & Chevrier, P. (1997). Une e! tudefamiliale et longitudinale au Centre de Me! decinePre! ventive de Nancy-Vandoeuvre: la cohorteSTANISLAS. In: Galteau M. M., Delwaide P., SiestG., Henny J., Eds. Biologie Prospective. Compterendus du IXe Colloque International de Pont a'Mousson, Eurobiologie, 29 Septembre-3 Octobre 1996,Paris : J. Libbey Eurotext Publishers ; 1997: 163–166.
Spielman, R. & Ewens, W. (1998). A sibship test forlinkage in the presence of association: the sibtransmissiondisequilibrium test. Am. J. Hum. Genet.62, 450–458.
Spielman, R. S., Mc Ginnis, R. E. & Ewens, W. J. (1993).Transmission test for linkage disequilibrium: theinsulin gene region and insulin-dependent diabetesmellitus (IDDM). Am. J. Hum. Genet. 52, 506–516.
Tre! goue$ t, D. A., Ducimetiere, P. & Tiret, L. (1997).Testing association between candidate-gene markers
and phenotype in related individuals, by use ofestimating equations. Am. J. Hum. Genet. 61, 189–99.
Tre! goue$ t, D. A., Herbeth, B., Juhan-Vague, I., Siest, G.,Ducimetie' re, P. & Tiret, L. (1999). Bivariate familialcorrelation analysis of quantitative traits by use ofestimating equations: applications to a familial analy-sis of the insulin resistance syndrome. Genet. Epidemiol16, 69–83.
Tre! goue$ t, D. A. & Tiret, L. (2000). Applications of theestimating equations theory to genetic epidemiology:a review. Ann. Hum. Genet. 64, 1–14.
Villard, E., Tiret, L., Visvikis, S., Rakotovao, R.,Cambien, F. & Soubrier, F. (1996). Identification ofnew polymorphisms of angiotensin I-converting en-zyme (ACE) gene, and study of their relationship toplasma ACE levels by two-QTL segregation-linkageanalysis. Am. J. Hum. Genet. 58, 1268–1278.
Wilson, D. E., Hata, A., Kwong, L. K., Lingam, A.,Shuhua, J., Ridinger, D. N., Yeager, C., Kaltenborn,K. C., Iverius, P. H. & Lalouel, J. M. (1993). Mutationsin exon 3 of the lipoprotein lipase gene segregating ina family with hypertriglyceridemia, pancreatitis, andnon-insulin-dependent diabetes. J. Clin. Invest. 92,203–211.
Yang, Q., Rabinowitz, D., Isasi, C. & Shea, S. (2000).Adjusting for confounding due to population admix-ture when estimating the effect of candidate genes onquantitative traits. Hum. Hered. 50, 227–233.
Zhang, Q., Cavanna, J., Winkelman, B. R., Shine, B.,Gross, W., Marz, W. & Galton, D. J. (1995). Commongenetic variants of lipoprotein lipase that relate tolipid transport in patients with premature coronaryartery disease. Clin. Genet. 48, 293–298.
Zhao, L., Prentice, R. & Self, S. (1992). Multivariatemean parameter estimation by using a partlyexponential model. J. R. Stat. Soc. [B]. 54, 805–811.
Zhu, X. & Elston, R. (2001). Transmissiondisequilibrium tests for quantitative traits. Genet.Epidemiol. 20, 57–74.