20
Ann. Hum. Genet. (2001), 65, 293–312 Printed in Great Britain 293 Sample size calculations for classical association and TDT-type methods using family data D. A. TREGOUET, C. PALLAUD, C. SASS, S. VISVIKIS L. TIRET INSERM Unite 525, 91 bd de lHo W pital, 75634 Paris Cedex, France (Received 20.9.00. Accepted 2.2.01) Transmission Disequilibrium Test (TDT)-based methods have been advocated by several authors for testing that a marker-phenotype association is actually due to linkage and not to uncontrolled stratification. As a pre-requisite of TDT-type methods is the presence of an association between marker and phenotype, one may wish to first investigate the association using a classical association study, and then to check by a TDT approach whether this association is actually due to linkage. We propose an estimating equation (EE) procedure, to compute analytically the minimum sample size of sibship data required to detect the association between a marker and a quantitative phenotype, and that required to confirm it by two TDT methods. We show that, when the marker allele frequency is low or high, the number of informative sibs needed in TDT-type methods can be lower than the number required in an association analysis, and even more so when the familial clustering is strong. However, in all cases, the number of sibs that need to be sampled to get the appropriate number of informative sibs for analysis is always larger for TDT methods than for an association study. In a phenotype-first strategy, this number may be critical when investigating costly phenotypes. The candidate gene approach is widely used for identifying genes involved in complex human diseases (Collins et al. 1997; Lander, 1996; Risch & Merikangas, 1996). It relies either on the characterization of functional polymorphisms which affect some biological phenotype(s) pre- disposing to disease, or on linkage disequilibrium existing between observed markers and un- identified functional polymorphisms. Generally, the role of candidate genes is investigated by means of classical association studies, i.e. by testing by conventional statistical methods the association, in a sample of unrelated individuals, between the marker genotype and the phenotype of interest, which can be the disease itself or some intermediate quantitative Correspondence : D. A. Tre ! goue $ t, INSERM U525, 91 bd de l’Ho # pital, 75634 Paris Cedex, France. Tel : 33–1- 40–77–96–93 ; Fax : 33-1–40–77–97–28. E-mail : tregouet!idf.inserm.fr trait. The major advantage of association studies lies in their simplicity and their flexibility. However, it is more and more common to be interested in testing association in family data, for example in large-scale samples of sibships originally collected for linkage analysis (Lind- painter et al. 1996) or nuclear families collected for complex segregation-linkage analysis (Villard et al. 1996). This can be performed by use of the estimating equations (EE) technique, which allows one to control for the familial dependency between individuals (Liang & Zeger, 1986 ; Tre ! goue $ t et al. 1997; Tre ! goue $ t & Tiret, 2000). The main interests of the EE method are the absence of distributional assumption, its ro- bustness to the misspecification of familial corre- lations, and its flexibility and easiness of use. The EE technique is now implemented in standard statistical software programs. However, even when performed using family data, association studies are open to the risk of

Sample size calculations for classical association and TDT-type methods using family data

Embed Size (px)

Citation preview

Ann. Hum. Genet. (2001), 65, 293–312

Printed in Great Britain

293

Sample size calculations for classical association and TDT-type methods

using family data

D. A. TREGOUET, C. PALLAUD, C. SASS, S. VISVIKIS L. TIRET

INSERM Unite 525, 91 bd de l’HoW pital, 75634 Paris Cedex, France

(Received 20.9.00. Accepted 2.2.01)

Transmission Disequilibrium Test (TDT)-based methods have been advocated by several authors

for testing that a marker-phenotype association is actually due to linkage and not to uncontrolled

stratification. As a pre-requisite of TDT-type methods is the presence of an association between

marker and phenotype, one may wish to first investigate the association using a classical association

study, and then to check by a TDT approach whether this association is actually due to linkage. We

propose an estimating equation (EE) procedure, to compute analytically the minimum sample size

of sibship data required to detect the association between a marker and a quantitative phenotype,

and that required to confirm it by two TDT methods. We show that, when the marker allele

frequency is low or high, the number of informative sibs needed in TDT-type methods can be lower

than the number required in an association analysis, and even more so when the familial clustering

is strong. However, in all cases, the number of sibs that need to be sampled to get the appropriate

number of informative sibs for analysis is always larger for TDT methods than for an association

study. In a phenotype-first strategy, this number may be critical when investigating costly

phenotypes.

The candidate gene approach is widely used

for identifying genes involved in complex human

diseases (Collins et al. 1997; Lander, 1996; Risch

& Merikangas, 1996). It relies either on the

characterization of functional polymorphisms

which affect some biological phenotype(s) pre-

disposing to disease, or on linkage disequilibrium

existing between observed markers and un-

identified functional polymorphisms.

Generally, the role of candidate genes is

investigated by means of classical association

studies, i.e. by testing by conventional statistical

methods the association, in a sample of unrelated

individuals, between the marker genotype and

the phenotype of interest, which can be the

disease itself or some intermediate quantitative

Correspondence: D. A. Tre! goue$ t, INSERM U525, 91bd de l’Ho# pital, 75634 Paris Cedex, France. Tel : 33–1-40–77–96–93; Fax: 33-1–40–77–97–28.

E-mail : tregouet!idf.inserm.fr

trait. The major advantage of association studies

lies in their simplicity and their flexibility.

However, it is more and more common to be

interested in testing association in family data,

for example in large-scale samples of sibships

originally collected for linkage analysis (Lind-

painter et al. 1996) or nuclear families collected

for complex segregation-linkage analysis (Villard

et al. 1996). This can be performed by use of the

estimating equations (EE) technique, which

allows one to control for the familial dependency

between individuals (Liang & Zeger, 1986;

Tre! goue$ t et al. 1997; Tre! goue$ t & Tiret, 2000).

The main interests of the EE method are the

absence of distributional assumption, its ro-

bustness to the misspecification of familial corre-

lations, and its flexibility and easiness of use. The

EE technique is now implemented in standard

statistical software programs.

However, even when performed using family

data, association studies are open to the risk of

294 D. A. T

spurious association, due either to inappropriate

choice of controls or to uncontrolled population

stratification. To circumvent this problem, a

class of statistical tests, referred to as ‘ family-

based’ tests, has been proposed among which the

transmissiondisequilibrium test (TDT) is the

most popular (Spielman et al. 1993). These tests

are aimed, in presence of association between a

candidate gene marker and a phenotype, at

determining whether this association is actually

due to linkage and not to other uncontrolled

phenomenon. Initially developed for a binary

phenotype, the TDT has been extended in several

ways (Cleeves et al. 1997; Knapp, 1999; Martin et

al. 1997; Spielman & Ewens, 1998). In particular,

its application to a quantitative phenotype has

received much attention (Abecasis et al. 2000;

Allison, 1997; Allison et al. 1999; Cardon,

2000; George et al. 1999; Monks & Kaplan, 2000;

Rabinowitz, 1997; Rabinowitz & Laird,

2000; Yang et al. 2000; Zhu & Elston, 2000;

Zhu & Elston, 2001).

Because of its theoretical advantages, the TDT

is often advocated as the approach of choice for

the genetic dissection of complex traits (Lander

& Schork, 1994; Risch & Merikangas, 1996).

However, when planning the design of a study,

several elements have to be taken into con-

sideration, in particular the sample size and cost

efficiency of the different approaches. This is all

the more critical since genetic effects underlying

complex phenotypes are expected to be quite

modest and will require large studies to be

detected. Moreover, a crucial requirement of

future genetic studies will be the availability of

new intermediate biological phenotypes, which

are often highly costly. Since a pre-requisite of

the TDT-type methods is the presence of an

association between marker and phenotype, one

may wish to first investigate the association

using a classical association study, and then to

check by a TDT approach whether this as-

sociation is really due to linkage.

The aim of the present study was to provide

some elements for determining the sample size

required for confirming, by a quantitative TDT

approach, an association detected by a classical

association study carried out in a random sample

of sibships. As mentioned above, several quan-

titative TDT-type models are available, but in

this study we focused our attention on two of

them, the model developed by George et al.

(1999) for pedigree data, and the TDT-Q5 model

of Allison (1997) that was initially developed for

families with one child, but extended here to

larger sibships. These models were chosen be-

cause they are based on simple linear regression

models, which allowed us to implement them

into an EE framework with the classical as-

sociation study.

While sample size calculation procedures for

independent subjects are well known, the cor-

responding situation for related individuals is

less developed. Recently, Rochon (1998) has

proposed a general EE procedure for sample size

calculations in correlated data. Since the classical

association approach and the two TDT-type

methods can be incorporated into a common EE

framework, we applied this procedure for cal-

culating sample sizes to the three approaches.

After a description of the method, we compared

the sample sizes required by each method under

different genetic models. An illustration was

finally given with a study of the relationship

between the Ser447Stop polymorphism of the

lipoprotein lipase (LPL) gene and plasma tri-

glyceride (TG) levels in family data.

Notations

Consider a sample of K sibships. Let yk¯

(yk"

,… , yknk

)t (t denotes ‘transposition’) be the

quantitative phenotype vector of the kth sibship

consisting of nk

sibs, with expected mean vector

µk. Assume that y

ki(i¯ 1 to n

k) can be expressed

as yki

¯ ηcki­ε

kiwhere η is a set of regression

parameters associated with a covariate vector cki

and εki

is a residual random effect. εk¯ (ε

k"

,… ,

εknk

has a multivariate distribution, not necess-

arily known, with mean vector 0 and variance-

covariance matrix Ωk

incorporating the residual

variances and the residual sib–sib correlations.

Sample sizes for quantitative traits 295

We will first describe the statistical models

used in the different approaches for modeling the

means and characterizing the association be-

tween the marker genotype and the phenotype.

Then we will show how the EE can be used for

estimating the regression parameter vector η,

while taking into account the familial depen-

dency between sibs.

Modeling of the means

Classical association analysis

The classical regression model used for testing

association between a quantitative phenotype

and a marker genotype is µki

¯E( yki

r xki)¯

αASSOC

­βASSOC

xki

where xki

denotes the co-

variate characterizing the genotype of the ith sib

of the kth sibship. The genotype is measured at a

di-allelic marker locus aA with allele frequency

pa1®p

a.In the most general form, the genotype

of a sib is a set of two indicator variables

corresponding to the genotypes Aa and AA, the

genotype aa being taken as the reference.

However, in our applications, we will only

consider particular models (additive, dominant

and recessive) such that this set of variables

reduces to only one variable. Under an additive

model, xki

¯ 0, 1 or 2 whereas for a dominant or

a recessive model xki

¯ 0 or 1. Note that all sibs

can be used and are then informative for a

classical association analysis, provided that the

residual sib–sib correlation is taken into account.

TDT-type analysis

In the following models, we need to assume

that the parental genotypic information is avail-

able or can be unambiguously deduced from the

offspring’s genotypes.

The TDT-Q5 regression model of Allison

(1997): In this model, the offspring’s phenotype

is regressed on the parental mating type and his

(her) own marker genotype. Only offspring for

which at least one parent is heterozygous at the

marker locus are informative and included in the

analysis. The expected mean is modeled as E( yki

r zk"

, zk#

, xki)¯α

TDT-A­π

"zk"

­π#zk#

­βTDT-A

xki

where (zk"

, zk#

) is a set of two dummy variables

characterizing the parental mating type of the

kth sibship. zk"

is equal to 1 if the parental

mating type is Aa¬Aa, 0 otherwise, zk#

is equal

to 1 if the parental mating type is Aa¬AA (or

AA¬Aa) and 0 otherwise, the mating type

aa¬Aa (or Aa¬aa) being considered as the

reference. xki

is defined as above. The initial test

statistic proposed by Allison (1997) was an F

statistic, only valid for families with one child.

However, the model can be easily extended to

several sibs by adequately taking into account

the sib–sib correlation as described in the next

section. The statistic used for testing the marker

effect will then be a Wald test statistic. This

extension of the TDT-Q5 of Allison (1997) will be

referred to as TDT-A.

The TDT-type regression model of George et

al. (1999): This model, as noted here as TDT-G

consists of regressing the offspring’s phenotype

on the parental transmission of the trait-

associated allele A at the marker locus. In this

model, the expected mean is expressed as E( yki

r gki)¯α

TDT-G­β

TDT-Ggki

where gki

is equal to 1

if the ith sib of the kth sibship received one allele

A from a heterozygous parent and 0 otherwise.

As in the model proposed by Allison (1997), only

offspring for whom at least one parent is

heterozygous at the marker locus are informative

and included in the model. However, in this

model, heterozygous sibs from two heterozygous

parents are not included in the analysis, since in

that case the A allele has been transmitted from

one parent but not transmitted from the other

parent, and therefore are not informative. There-

fore, in this model, the same sibship can include

both informative and non informative sibs. This

is in contrast to the two previous regression

models where any sibship used in the analysis

includes sibs that are all informative. George et

al. (1999) proposed to use a maximum likelihood

(ML) procedure for estimating and testing the

regression parameters, while taking into account

the sib–sib dependency. We propose to use an

EE procedure which has been shown to be as

efficient as the ML procedure (Tre! goue$ t et al.

1997), and has the advantage of making sample

size calculations easier.

296 D. A. T

Estimation of the regression parameters

In order to take into account the familial

dependency between the sibs under the three

statistical models, we propose to estimate the

regression parameters by means of the EE

technique. Following Liang & Zeger (1986), the

regression parameter estimates satisfy the fol-

lowing equations

3K

k="

¥µτ

k

¥ηVar ( y

k)−"( y

k®µ

k)¯ 0, (1)

where Var( yk) is a ‘working’ variance-covariance

matrix (see below).

The solution η# of (1) is such that K"/# (η# ®η) is

asymptotically normally distributed with mean

0 and covariance matrix consistently estimated

by

KW−" 0 3K

k="

¥µτ

k

¥ηVar ( y

k)−" ( y

k®µ

k) ( y

s®µ

k)t

¬Var ( yk)−"

¥µk

¥η 1W−" (2)

where

W¯ ΣK

k="

¥µtk

¥ηVar ( y

k)−"

¥µk

¥η,

the quantity (2) being evaluated at η# .It can be shown (Liang & Zeger, 1986) that the

EE estimates provided by (1) and (2) are

asymptotically unbiased even if the matrix Var

(yk) used is not the true variance-covariance

matrix of εk, i.e. even if the familial dependency

between sibs is not correctly specified. In the

problem considered here, familial correlations

are treated as nuisance parameters that are of

little interest. The EE estimates are said to be

robust to any misspecification of the familial

dependency. The robustness property of the EE

technique also lies in the absence of distributional

assumption for the εk’s. These robustness proper-

ties hold as long as the regression model for µk

is

correct. Liang & Zeger (1986) provided some

examples of incomplete specifications for Var

( yk). In some circumstances, for example in small

samples (K! 50) or when it can be considered

that the chosen specification of Var ( yk) is

correct, it is preferable to use W−" instead of (2)

for estimating the covariance matrix of η# (Pren-

tice & Zhao, 1991; Rotnizsky & Jewell, 1990).

The former, W−", is called the ‘model-based’

variance estimate whereas the latter, formula

(2), is referred to as the ‘robust’ variance

estimate.

In our applications, we assumed for the

specification of Var ( yk) that sibs were equi-

correlated with residual sib–sib correlation ρ and

with constant residual variance σ#.

Sample size calculations based on estimating

equations

Suppose that we wish, prior to the analysis, to

determine the number K of sibships required to

detect the genetic effect at the marker locus for a

given power under one of the three statistical

models. The hypothesis tested is : Ho: η

o¯ 0

versus H": η

o1where η

ois either β

ASSOC, β

TDT-A

or βTDT-G

. In an EE framework, this is generally

performed by means of the Wald test statistic T

¯ η# #ovar(η#

o) (see Rotnitzky & Jewell (1990) for a

discussion on the use of the Wald statistic).

Following Rochon (1998), we will partition the

sample of K sibships into S groups as follows:

each sibship k is assigned to a group s which

gathers all the sibships that have the same

expected mean vector µs, the same matrix of

covariates and the same sibship structure (i.e the

same number of sibs). All sibships within the

same group s will then have the same variance-

covariance matrix Var( ys). Let n

sbe the number

of sibships in group s (3S

s="¯K) and w

sbe the

percentage of sibships in group s relative to the

total number of sibships (ws¯n

sK). Examples

of such partition will be given in the next section.

Following Rochon (1998), the expected EE

estimate that would be a solution of (1) is given

by

η# ¯ 03S

s="

ns

¥µts

¥ηVar ( y

s)−"

¥µs

¥η 1−"

¬03S

s="

¥µts

¥ηVar ( y

s)−"µ

s1. (3)

For a given partition, while the robust vari-

ance (2) of η# would not be known at the design

Sample sizes for quantitative traits 297

Table 1. Partition of a sample of eligible sibships of size 1 for a TDT-A and TDT-G analysis

Parental Expected frequencyExpected mean***

Group s mating type Sib genotype ωs(**) TDT-A TDT-G

1 aa¬Aa aa 2p$ap

AP

Totα

TDT-Aα

TDT-G

2 aa¬Aa Aa 2p$ap

AP

Totα

TDT-A­β

TDT-Aα

TDT-G­β

TDT-G

3 Aa¬Aa aa p#ap#

AP

Totα

TDT-A­π

TDT-G

4(*) Aa¬Aa Aa 2p#ap#

AP

Totα

TDT-A­π

"­β

TDT-A—

5 Aa¬Aa AA p#ap#

AP

Totα

TDT-A­π

"­2β

TDT-Aα

TDT-G­β

TDT-G

6 Aa¬AA Aa 2pap$

AP

Totα

TDT-A­π

#­β

TDT-Aα

TDT-G

7 Aa¬AA AA 2pap$

AP

Totα

TDT-A­π

#­2β

TDT-Aα

TDT-G­β

TDT-G

(*) Group 4 is not used in a TDT-G analysis since sibs are not informative.(**) P

Totis the probability of a sib to be informative. It is equal to 4p

ap

A(1®p

ap

A) in a TDT-A analysis and to 2p

ap

A

(2®3pap

A) in a TDT-G analysis.

(***) For TDT-A, an additive model was assumed. For TDT-G, the formula giving the phenotypic mean does notdepend on the genetic model. However, the parameter β

TDT-Gwill not be the same in the different models (see

Table 2).

stage, since it would require data on the unknown

residuals ( yki®µ

ki), the expected model-based

variance estimate can be calculated as

W−"¯ 03S

s="

ns

¥µts

¥ηVar ( y

s)−"

¥µs

¥η 1−"

¯K−" 03S

s="

ws

¥µts

¥ηVar ( y

s)−"

¥µs

¥η 1−"

. (4)

Under H!, T converges asymptotically towards

a central χ# distribution with 1 degree of freedom

(df). Alternatively, under H", the asymptotic

distribution of T is a non-central χ# distribution

with 1 df and a non-central parameter approxi-

mately given by

vEKη# #!va jr(η#

o). (5)

Therefore, given the expected power 1®β and

the level of significance α, one can derive the

value of ν from tables of non-central χ# dis-

tribution or from computing software (for

example, PROC CNONCT in the SAS package

(SAS Institute Inc., Cary, NC)). Given ν, K can

be then determined by solving (5). Alternatively,

given a partition of a sample of K sibships and a

significance level α, the power to detect the

desired effect is given by

1®β¯&+¢

c

g(z, 1, ν) dz,

where c is the critical value, for a significance

level of α, of the central χ# distribution with

1 df, and g is the density function of the non-

central χ# distribution with 1 df and a non-central

parameter ν.

It is important to note that, while the model-

based variance W−" is used in sample size

calculations, the robustness properties of the EE

estimates rely on the use of the robust variance

(2). However, if the investigators have some

information on the true pattern of correlation,

the use of W−" for estimation is valid. In the

following applications, its use is quite reasonable

since sibs can reasonably be assumed to be

equicorrelated and with equal residual variance.

Anyway, as pointed out by Rochon (1998), in

order to maintain robustness, one can still apply

several forms of covariance matrix for Var ( ys),

and then investigate their influence on the

sample size calculations.

Application to the different methods

From now, we will assume that the sibships are

randomly sampled from a population in which

Hardy–Weinberg (HW) equilibrium holds.

Classical association analysis

Suppose that all K sibships are of size 1. Every

sibship can be assigned to one of the S¯ 3 groups

defined by the three possible genotypes aa, Aa

and AA of a sib. The expected frequencies ωsof

298

D.A

.T

Table 2. Analytical expression of the expected regression parameter estimate and of its expected variance obtained from a classical association, a

TDT-A and a TDT-G analysis, under an additive or a dominant model, in sibships of fixed size

Variance

Parameter Sibships of size 1 Sibships of size 2 Sibships of size 3

Additive modelAssociation )βASSOC)¯A h#σ#

2(1®h#)pap

A

σ#

2pap

AK

σ#(1®ρ#)

2pap

AK(2®ρ)

σ#(1®ρ) (1­2ρ)

6pap

AK

TDT-A βTDT-A

¯βASSOC 4(1®p

ap

A)σ#

K

2(1®pap

A)σ# (1®ρ#)

K

4(1®pap

A)σ# (1®ρ) (1­2ρ)

3K(1­ρ)

TDT-G*β

TDT-G¯β

ASSOC

(2®2pap

A)

(2®3pap

A)

4σ#

K

2σ#(1®ρ#) (4®5pap

A)

K(4®6pap

A®p

ap

Aρ#)

2σ#(1®ρ#) (1­2ρ) (8®9pap

A)

3K(4(1­ρ)#®pAp

A(6­12ρ­8ρ#­ρ$))

Dominant modelAssociation )βASSOC)¯A h#σ#

(1®h#)p#a(1®p#

a)

σ#

p#a(1®p#

a)K

σ#(1®ρ#)

2p#a(1®p#

a)K01®ρ

(1-3pa)

(4­4pa)1

σ#(1®ρ) (1­2ρ)

3p#a(1®p#

a)K01­ρ

(1®pa)

(1­pa)1

TDT-A βTDT-A

¯βASSOC 16(1®p

ap

A)σ#

Kpa(3­p

a)

8(1®pap

A)σ#(1®ρ#)

Kpa(3­p

a)

16(1®pap

A)σ#(1®ρ) (1­2ρ)

3Kpa(3­p

a) (1­ρ)

TDT-G*β

TDT-G¯β

ASSOC

pa(1­p

a)

(2®3pap

A)

4σ#

K

2σ#(1®ρ#) (4®5pap

A)

K(4®6pap

A®p

ap

Aρ#)

2σ#(1®ρ#) (1­2ρ) (8®9pap

A)

3K(4(1­ρ)#®pap

A(6­12ρ­8ρ#­ρ$))

K is the number of eligible sibships. pA¯ 1®p

a. h# is the heritability associated to the marker. σ# is the residual variance and ρ the residual sib-sib correlation.

* Note that, unlike the other two methods, the analytical expression for the regression parameter estimated by the TDT-G method depends on ρ. The formula givenin this table is that obtained when ρ¯ 0.

Sample sizes for quantitative traits 299

these groups are therefore p#a, 2p

a(1®p

a) and (1®

pa)#. The expected means µ

sin these three groups

are µaa

, µAa

and µAA

. Similarly, a sample of K

sibships of size 2 where the 2 sibs play a

symmetric role can be partitioned into 6 sub-

groups corresponding to the 6 possible genotypic

vectors of the sib pair (aa, aa), (aa, Aa), (aa, AA),

(Aa, Aa), (Aa, AA) and (AA, AA). The number of

groups in which a sample of sibships of size 3 can

be partitioned is 15, and increases with the

sibship size.

TDT-A analysis

Unlike the classical association analysis where

partitioning depends solely on the sib’s genotypic

vector, we must now take into account the

distribution of the two additional covariates

characterizing the parental mating type. A

sample of K eligible sibships (sibships that

include informative sibs) of size 1 must now be

partitioned into S¯ 7 groups according to the

parental mating type and the sib’s genotype as

described in Table 1. Table 1 also provides the

expected frequency ωs

as well as the expected

mean for an additive allele effect. Similarly, it

can be shown that any sibship of size 2 (resp. size

3) can be assigned to one of 12 (resp. 18) possible

groups.

TDT-G analysis

Under a TDT-G model, a sample of K eligible

sibships of size 1 must be partitioned into S¯ 6

groups as depicted in Table 1. These groups are

the same as those defined for a TDT-A analysis,

except for the group which includes heterozygous

sibs of two heterozygous parents, which are not

informative in a TDT-G model. For sibships of

size 2, the number of groups becomes 11 instead

of 12 under the TDT-A model. However,

amongst these 11 groups, 2 groups include

sibships with only one informative sib. For

sibship of size 3, the number of groups is 17 (12

with 3 informative sibs, 3 with 2 informative sibs

and 2 with 1 informative sib).

Up to now, we have been interested in samples

of sibships of fixed size. It can be easily seen that

the partitions proposed above can also apply to

sibships of varying size. All the ωs

would have

to be multiplied by appropriate weights corre-

sponding to the proportions of the different kinds

of sibships in the whole sample.

For sibships of fixed size, values of the

expected regression parameter estimates (i.e.

β#ASSOC

, β#TDT-A

, β#TDT-G

) and of their variances

can be obtained by solving analytically (1) and

(2). These values are given in Table 2 for sibships

of size 1, 2 and 3, under an additive and a

dominant model. Formulae for a recessive model

are identical to those obtained for a dominant

model with pa

replaced by 1®pa. Note that sol-

ving (1) indicates that the expected estimates

of the parental mating type coefficients in the

TDT-A model are null under random mating

and HW equilibrium. Therefore, the expected

estimate of the genotype effect obtained under

the TDT-A analysis is the same as that provided

by the classical association analysis. However,

these two estimates differ by their variance.

For sibships of varying size, corresponding ex-

pressions are no longer straightforward.

Note finally that the assumption of Hardy–

Weinberg equilibrium could be relaxed. This

would require to know the true genotypic

distribution in the population from which the

sample is drawn and to make the ωs

dependent

on this distribution.

Sample size calculations under the different

methods

The sample size, which can be obtained by use

of the mathematical expressions given in Table

2, is a function of the marker heritability, the

marker allele frequency pa

and the residual

sib–sib correlation ρ. In a classical association

analysis and in a TDT-A analysis, the number of

informative sibs is n times the number of eligible

sibships of size n, since all sibs are informative.

This is not true for a TDT-G analysis (see above).

We considered a di-allelic marker associated

with a heritability of 5% and having either an

additive or a dominant effect. Results for a

recessive effect are similar to those obtained for

300 D. A. T

TDT-G

Fig. 1. Number of informative sibs required to detect, in the different models, the effect of an additivemarker associated with an heritability of 5%, according to the residual sib–sib correlation. Power¯ 0.90,significance level α¯ 0.05, allele frequency of the marker: p

a¯ 0.2.

a dominant model when pa

is replaced by 1®pa.

The power and the significance level were taken

to be 0.90 and 0.05, respectively.

Additive effect

Figure 1 shows the influence of the sibship size

and the residual sib–sib correlation on the

number of informative sibs required to detect the

additive effect of a di-allelic marker with allele

frequency 0.2. The number of informative sibs

required in a classical association framework

follows a symmetric f-shaped curve according

to the sib–sib correlation. Analytical calculations

indicated that, whatever the allele frequency,

the maximum occurs for ρ¯ 2®o 3 for sibships

of size 2 and 0.25 for sibships of size 3, and the

minimum for ρ¯ 0 or ρ¯ 0.5. Besides, the

number of informative sibs required increases

with the sibship size. Under an additive model, it

is then more powerful to carry out a classical

Sample sizes for quantitative traits 301

Fig. 2. Number of informative sibs required to detect, for different sibship sizes, the effect of an additivemarker associated with an heritability of 5%, according to the marker allele frequency. Power¯ 0.90,significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.

association study with small sibships, the most

powerful design being to include independent

subjects.

Conversely, the number of informative sibs

required in a TDT-type analysis decreases with

both the sibship size and the residual sib–sib

correlation, indicating that the stronger the

familial clustering, the more powerful the TDT-

type analysis. Note that, for an additive model

and an allele frequency pa¯ 0.2, the number of

informative sibs required by the TDT-A (resp,

TDT-G) method is lower than that of the

association method as soon as ρ" 0.14 (resp,

" 0.10) for sibships of size 2 and ρ" 0.07

(resp," 0.05) for sibships of size 3.

Figure 2 shows the influence of the marker

302 D. A. T

Fig. 3. Number of informative sibs required to detect, in the classical association and the TDT-A models,the effect of a dominant marker associated with an heritability of 5%, according to the residual sib–sibcorrelation. The TDT-G model is not represented because it generally requires a much higher number. Power¯ 0.90, significance level α¯ 0.05, allele frequency of the marker: p

a¯ 0.2.

allele frequency on the number of informative

sibs required by each method when the residual

sib–sib correlation is set to 0.2. In an association

study, the number of informative sibs does not

depend on the allele frequency when the allele

effect is additive, but slightly increases with the

sibship size. When the residual sib–sib correlation

is set to 0 or to 0.5, the number of informative

sibs no longer depends on the allele frequency, as

already shown in Fig. 1.

By contrast, the number of informative sibs

required under the TDT-type methods are

strongly influenced by the marker allele fre-

quency. The number of informative sibs required

by both methods follow a f-shaped curve with a

maximum achieved for pa¯ 0.5, regardless of

sibship size. However, as the sibship size

increases, the allele frequency interval in which

the TDT-A and TDT-G analyses require more

informative sibs than the classical association

analysis narrows. With sibships of size 1, the

TDT-A analysis is more demanding than the

association analysis when pa

lies between

0.18 and 0.82. For ρ¯ 0.2, this range becomes

0.22–0.78 and 0.25®0.75 for sibships of size 2

and 3, respectively. The corresponding ranges

for the TDT-G method are 0.19®0.81, 0.23®0.77

and 0.26®0.74, respectively. The allele fre-

quency interval in which the TDT-A and TDT-G

analyses require more informative sibs than

the classical association analysis also narrows

when the sib–sib correlation increases, as already

reflected in Fig. 1. For example, when ρ¯ 0.5,

the TDT-type methods are less demanding in

terms of informative sibs than the classical

association analysis, except in a small range of

allele frequency around 0.5 for sibships of size

2. Both TDT-type methods have comparable

power, the TDT-G method requiring slightly

fewer informative sibs than the TDT-A method

to detect the additive effect of marker (Fig. 1),

except for very low allele frequency (pa! 0.04)

and a strong clustering effect (e.g. sibships of

size 3 and ρ" 0±3).

Dominant effect

Figures 3 and 4 present similar results for a

dominant model. The number of informative

sibsrequired by a classical association study now

follows an asymmetric f-shaped curve accord-

Sample sizes for quantitative traits 303

Fig. 4. Number of informative sibs required to detect, for different sibship sizes, the effect of a dominantmarker associated with an heritability of 5%, according to the marker allele frequency. Power¯ 0.90,significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.

ing to the residual sib–sib correlation, with a

unique minimum occurring at ρ¯ 0.5. For

pa¯ 0.2, it is more powerful to work on sibships

than on independent individuals as soon as

ρ" 0.34. This value is 0.42 for pa¯ 0.5.

For a TDT-A analysis, the influence of the

residual sib–sib correlation and of the sibship

size on the number of required informative sibs

followed the same pattern as that observed for

an additive effect. Unlike the additive case

304 D. A. T

where both TDT-A and TDT-G methods had

comparable powers, under a dominant model, in

most situations the TDT-G method requires a

very large number of informative sibs, and for

this reason, is not represented in Fig. 3. As an

indication, for detecting the dominant effect of a

marker having an allele frequency pa¯ 0.2 and

associated with a heritability of 5%, the TDT-G

method requires 1230 informative sibs when sib-

ships are of size 1. For sibships of size 2 and 3

with residual sib–sib correlation of 0.5 (the most

favorable situation), the corresponding numbers

become 1084 and 981, respectively.

Figure 4 shows the influence of the allele

frequency on the number of informative sibs

required by each method in a dominant model.

The number of informative sibs required by a

classical association analysis, for a given sibship

size and whatever the residual sib–sib correlation,

slightly and linearly increases with the allele

frequency, except for sibships of size 1 where

itremains constant. The number of informative

sibs required by a TDT-A analysis follows a f-

shaped curve very similar to that observed for

an additive model. The maximum now occurs at

paC 0.57, instead of 0.5 in the additive model,

whatever the sibship size and the residual sib–sib

correlation. As indicated in Fig. 4, a TDT-A

analysis would require less informative sibs than

an association analysis to detect a dominant

effect, as soon as pa

lies outside [0.28®0.80] for

sibships of size 1. This interval becomes

[0.32®0.77] and [0.36®0.75] when dealing with

sibships of size 2 and 3, respectively. Data not

shown also indicate that when ρ¯ 0.5 and the

sibship size is larger than 1, a TDT-A analysis

always requires less informative sibs than a

classical association analysis. Lastly, the number

of informative sibs required in a TDT-G analysis

is, whatever the sibship size and the residual

correlation, a decreasing function of the allele

frequency. Besides, the number of informative

sibs required slightly decreases as the sibship size

and the residual sib–sib correlation increase.

When the allele frequency exceedsC 0.86, the

TDT-G analysis is less demanding than a classical

association and close to the TDT-A analysis,

whatever the sibship size and the residual sib–sib

correlation. However, as soon as the allele

frequency is lower than 0.7, the opposite is

observed, the difference between methods being

extremely important for low allele frequencies.

Cost efficiency of the different methods

In the previous section, the calculations fo-

cused on the number of informative sibs, i.e. the

number of sibs really used in the analysis. While

all sibs are used in a classical association analysis,

in TDT-type methods only sibs from mating

types including at least one heterozygous parent

are used. This implies that a larger number of

genotypes andor phenotypes need to be de-

termined than the number really used. It is then

important to compare the different methods in

terms of cost efficiency. As described by Allison

(1997), there are two kinds of screening pro-

cedures in TDT-type studies. The first one,

referred to as the genotype-first strategy, consists

of (1) genotyping a sample of parents, (2)

selecting sibships whose parental mating type is

eligible and (3) genotyping and phenotyping sibs

from eligible sibships. The second procedure,

referred to as the phenotype-first strategy,

consists of (1) phenotyping a sample of sibships,

(2) genotyping parents in order to select eligible

sibships and (3) genotyping sibs from eligible

sibships. When studying the genetic component

of a multifactorial phenotype, several candidate

genes are generally investigated and the

phenotype-first strategy seems the most ap-

propriate since informative sibs at a given locus

may not be informative at another locus. There-

fore, we will now present a study of the total

number of sibs to be phenotyped when applying

each of the methods previously described under a

phenotype-first strategy. This number can be

again derived from the mathematical expressions

given in Table 2.

Additive effect

The influence of the sibship size and of the

allele frequency on the total number of sibs to be

phenotyped to detect the additive effect of a di-

Sample sizes for quantitative traits 305

Fig. 5. Total number of phenotypes to be measured in a phenotype-first strategy to detect, for differentsibship sizes, the effect of an additive marker associated with an heritability of 5%, according to the markerallele frequency. Power¯ 0.90, significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.

allelic marker when ρ¯ 0.2 is presented in Fig. 5.

In a TDT-G analysis, the total number of sibs to

be phenotyped follows a symmetric f-shaped

curve according to the allele frequency while, in

classical association and TDT-A analyses, it does

not depend on the allele frequency. In contrast

with what was observed in Fig. 2, where the

number of informative sibs could be lower in

306 D. A. T

Fig. 6. Total number of phenotypes to be measured in a phenotype-first strategy to detect, for differentsibship sizes, the effect of a dominant marker associated with an heritability of 5%, according to the markerallele frequency. Power¯ 0.90, significance level α¯ 0.05, residual sib–sib correlation: ρ¯ 0.2.

TDT-type analyses than in a classical association

study for extreme allele frequencies, the total

number of sibs to be phenotyped is now always

higher in TDT-type analyses. For both TDT-

type methods, the number of phenotypes, how-

ever, decreases as the sibship size or the sib-sib

correlation increases, the influence of the sibship

size being more marked for high than for

moderate sib–sib correlation. Differences

between the three methods tend then to decrease

as the clustering of the data becomes more

marked.

Sample sizes for quantitative traits 307

Table 3. Mean (Standard Deviation) of trigly-

ceride levels according to LPLS447X genotype

in sibs

LPLS447X genotype

SS SX XX R#*

All N¯ 559 N¯ 166 N¯ 160.84 0.70 0.85 2.2%

(0.37) (0.30) (0.53) p! 10−%

Boys N¯ 287 N¯ 78 N¯ 70.84 0.63 1.03 4.2%

(0.39) (0.26) (0.69) p! 10−%

Girls N¯ 272 N¯ 88 N¯ 90.84 0.77 0.71 1.4%

(0.35) (0.31) (0.34) p¯ 0.04

* Test was performed by use of the EE technique onlog-transformed values, adjusted on age, gender and oralcontraception (when appropriate) assuming a dominanteffect of the LPLX447 allele. Untransformed values areshown.

Dominant effect

Again, the TDT-type methods are always more

demanding in terms of phenotypes than the

classical association study. The difference be-

tween the TDT-A method and the association

method increases linearly with the allele fre-

quency. As already stressed in the previous

section concerning power, the TDT-G method

required many more phenotypes than the two

other methods, except for very high allele

frequencies.

A statistical program in C language and a

‘ interactive-web’ tool relying on formulae of

Table 2 have been developed for calculating the

power, the minimum sample size and the cost of

a study for each statistical method developed in

this paper. Both the phenotype-first and the

genotype-first strategies were envisaged,

although only the results of the former one have

been presented here. These programs are avail-

able upon request from the authors.

Illustration

We illustrated the power calculations pre-

sented above by an example on the relationship

between the lipoprotein lipase (LPL) gene and

triglyceride (TG) levels. LPL is a key enzyme

involved in the metabolism of TG-rich lipo-

proteins, making the LPL gene a candidate for

the development of atherosclerosis. Several poly-

morphisms of the LPL gene have been described

(Gagne! et al. 1994; Nickerson et al. 1998; Wilson

et al. 1993). In particular, a Serine447Stop

(S447X) mutation leading to a premature stop

codon has been shown to affect TG levels, the

stop allele being consistently associated with

lower TG levels (Gagne! et al. 1996; Garenc et al.

2000; Humphries et al. 1998; Jemaa et al. 1995;

Zhang et al. 1995). We investigated the re-

lationship between the S447X polymorphism

and TG levels in a sample of 513 sibships selected

from healthy nuclear families who had vol-

unteered for a free health examination as part of

the STANISLAS Cohort (Siest et al. 1997).

Sibships with all sibs aged& 15 years were

selected, leading to a sample composed of 302

sibships of size 1, 194 sibships of size 2 and 17

sibships of size 3 (N¯ 741). The TG distribution

was adjusted for age, sex and oral contraception

in girls, and log-transformed to remove positive

skewness. Parents and sibs were genotyped for

the S447X polymorphism.

The mean age (³SD) of sibs was 17.5 (³ 2.5)

years and the mean TG level was 0.81 (³0.36) mmoll. The S447X genotype distribution

in parents was compatible with HW expectations

and the S447 allele frequency was estimated as

0.88³0.01. Results from the classical association

analysis based on the EE technique are reported

in Table 3. Due to their low number, homo-

zygotes for the X447 allele were pooled with

heterozygotes. Assuming a dominant model, TG

levels were significantly decreased in carriers of

the X447 allele (p! 10−%), the S447X poly-

morphism explaining 2.2% (h#) of the TG varia-

bility after adjustment for age, sex and oral

contraception in girls. The association was highly

significant in boys (h#¯ 4.2%, p! 10−%) and

borderline in girls (h#¯ 1.4%, p¯ 0.04). The

genotype¬gender interaction was borderline

(p¯ 0.06). After controlling for the S447X effect,

the residual common sib–sib TG correlation,

estimated by the EE technique (Tre! goue$ t et al.

1999), was 0.20 (95%CI [0.07®0.32] ; p¯ 0.002).

308 D. A. T

Table 4. Minimum number of sibships and informative sibs required to detect, with a 0.90 power, the

dominant effect of the LPLX447 allele according to the statistical method. The sibship size

distribution is the same as that observed in the sample (significance level : a¯ 0.05)

Total number ofsibships to be sampled

Number of eligiblesibships

Number ofinformative sibs

All(ρ¯ 0.20; h#¯ 2.2%)

Classical Association 337 337 485TDT-A 613 231 333TDT-G 644 233 330

Boys(ρ¯ 0.23; h#¯ 4.2%)

Classical Association 201 201 245TDT-A 373 141 172TDT-G 392 141 170

Girls(ρ¯ 0.19; h#¯ 1.4%)

Classical Association 619 619 756TDT-A 1157 437 534TDT-G 1215 437 528

ρ¯ residual correlation; h#¯heritability associated to the marker with allele frequency of 0.88.

Table 5. Parameter estimates obtained by classical association and TDT-type analyses of the

relationship between TG levels and the LPLS447X polymorphism, assuming a dominant model

All Boys Girls

Classical Association Analysis N¯ 741 N¯ 372 N¯ 369α#

ASSOC(SE) ®0.578 (0.124) ®0.675 (0.187) ®0.422 (0.159)

β#ASSOC

(SE) ®0.157 (0.035) ®0.223 (0.052) ®0.093 (0.044)Test of genetic effect p! 10−% p! 10−% p¯ 0.04

TDT-A Analysis N¯ 286 N¯ 137 N¯ 149α#

TDT-A(SE) ®0.511 (0.206) ®0.746 (0.337) ®0.297 (0.252)

π#"(SE) 0.055 (0.082) 0.179 (0.143) ®0.043 (0.096)π##

(SE) 0.010 (0.121) 0.121 (0.179) ®0.151 (0.084)β#

TDT-A(SE) ®0.213 (0.051) ®0.311 (0.078) ®0.107 (0.065)

Test of genetic effect p! 10−% p! 10−% p¯ 0.10

TDT-G Analysis N¯ 267 N¯ 131 N¯ 136α#

TDT-G(SE) ®0.489 (0.214) ®0.728 (0.372) ®0.296 (0.249)

β#TDT-G

(SE) ®0.206 (0.052) ®0.292 (0.083) ®0.114 (0.063)Test of genetic effect p! 10−% p! 10−$ p¯ 0.07

Analyses were performed on log-triglycerides levels adjusted on age, gender and oral contraception in girls, by useof the EE technique.

N is the number of informative sibs used in each analysis.

The brother–brother and sister–sister residual

correlations were 0.23 and 0.19, respectively.

As indicated in Table 4, more than 750 girls

within at least 619 sibships would be required to

detect such an effect in a classical association

analysis with a 0.90 power. More than 1100

sibships would have to be collected to detect such

an effect by one of the TDT-type methods, even

though the actual number of subjects on whom

the analysis would be performed is lower than for

the association analysis. As shown in Table 5,

both TDT-type methods found a significant

effect in the whole sample and in boys, but not in

girls. Parameters estimated by the three analyses

are given in Table 5. Although expected values of

β#ASSOC

and β#TDT-A

should be identical, the

observed values differed slightly due to the fact

that βTDT-A

was estimated in a subset of the total

sample. Note finally that the parental mating

type coefficients of the TDT-A analysis, π#"

and

π##, were not significantly different from 0, as

expected in a population in HW equilibrium.

Sample sizes for quantitative traits 309

Unlike association studies which are submitted

to bias due to uncontrolled stratification, TDT-

type studies allow one to definitively conclude

that association is due to linkage and not to

other phenomenom. Ideally, one should then

always perform a TDT rather than an association

study. However, TDT-type studies are more

demanding than association studies in terms of

genotyping andor phenotyping, and are often

more complex to set up. Therefore, before

embarking in a TDT-type study, it is important

to evaluate the effect and the cost of this

approach.

During the last decades, a large amount of

research has been directed to the development of

TDT-type methods for quantitative traits

(Abecasis et al. 2000; Allison 1997; Allison et al.

1999; Cardon 2000; George et al. 1999; Martin et

al. 2000; Monks & Kaplan, 2000; Rabinowitz,

1997; Rabinowitz & Laird, 2000; Yang et al.

2000; Zhu & Elston, 2000; Zhu & Elston, 2001).

Our intention in this paper was not to compare

the power of all these methods, but just to

investigate in which conditions it would be

possible to confirm by means of a TDT approach

the effect of a marker on a quantitative trait

found in an association study. Therefore, we

focused our interest on two TDT-type methods

formulated through simple regression models

allowing one to compute analytically sample

sizepower by the EE procedure. One of these

TDT methods is that described by George et al.

(1999), originally based on an ML approach, but

for which we proposed a more flexible EE

approach. The other one is an extension of the

model proposed by Allison (1997), initially

restricted to one child per family, but extended

here to several sibs using the EE technique. Both

methods are then valid for testing allelic as-

sociation since they correctly take into account

the familial dependency between sibs. Note that

these two TDT-type methods are not necessarily

the most powerful ones; in particular an alterna-

tive TDT-G method (Zhu & Elston, 2001) has

been recently proposed which appears to be more

powerful than the original one. However,

expressed in an EE framework, these regression-

based methods have the great advantage of

being easily implemented in standard statistical

packages such as Proc Genmod in SAS (SAS

Institute Inc., Cary, N. C.), making our sample

sizepower calculation procedure of wide use.

The figures provided in this paper should then be

interpreted as orders of magnitude for the sample

sizes required in a TDT approach.

Some general conclusions can be drawn from

our calculations. Under an additive model, the

power of a classical association study is

maximized when the analysis is performed on

unrelated individuals. Conversely, the power of

the TDT-type methods increases both with the

sibship size and the residual sib–sib correlation.

This result is in agreement with recent studies

which indicated that switching from sib pairs to

larger sibships increases the power of a quan-

titative TDT-type analysis (Allison et al. 1999;

George et al. 1999; Monks & Kaplan, 2000),

especially when the residual sib–sib correlation is

high (Allison et al. 1999). For a given heritability,

the power of an association study does not

depend on the marker allele frequency, unlike

the TDT-type methods which have a maximum

power for extreme allele frequencies. Classical

association studies appear to require less in-

formative sibs than TDT-type studies when the

allele frequency is close to 0.5, while the opposite

is observed for high or low allele frequencies,

especially when the sibship clustering is strong.

The situation is slightly different under a domi-

nant (or a recessive) model. The first difference is

that, unlike for the additive case, the power of an

association analysis is not always maximum in

unrelated individuals, but can be increased by

working on sibships if there is a high sib–sib

correlation. Moreover, the power slightly

decreases with the allele frequency. The second

difference is that except for high allele

frequencies (low for recessive models) the number

of informative sibs required by the TDT-G

method is considerably higher than that required

by the two other methods. Again, a smaller

number of informative sibs is needed in the

310 D. A. T

classical association analysis than in the TDT-A

method for allele frequencies close to 0.5, the

opposite being observed for extreme allele

frequencies. It can be also deduced from Table 2

that, for a given heritability, the sample size

required by both a classical association analysis

and a TDT-A analysis is higher or equal for

detecting an additive effect than a dominant or a

recessive effect. Therefore, when planning to

study the relationship between a marker and a

quantitative phenotype in sibship data, it is

more conservative to determine the sample size

by assuming an additive model, except if another

genetic model can be predicted from previous

studies.

Sample size calculations were based on the

minimum number of sibs required in the stat-

istical analysis (i.e. informative sibs), and not on

the number of sibs initially collected to find the

appropriate number of informative sibs. As

mentioned before, two kinds of screening pro-

cedures can be used to collect data: the genotype-

first screening strategy which consists of geno-

typing a sample of parents from which sibs from

eligible sibships are genotyped and phenotyped,

and the phenotype-first strategy where a sample

of sibs is initially phenotyped and the parents are

then genotyped in order to select informative

sibs. We have shown that, when the phenotype-

first strategy is chosen, the TDT-G method

always requires more sibs to be collected than

the TDT-A method, which itself requires more

sibs than a classical association analysis. How-

ever, the differences between methods tend to

decrease as the sibship clustering increases. As an

example, for a frequent allele and a moderate

correlation, the TDT methods would be 1.5 to 2-

fold more demanding in terms of sample size

than a classical association study.

A similar analysis could have been done with

the genotype-first strategy. The procedure de-

veloped in this paper can be used to help the

investigators to choose the most efficient strat-

egy, which clearly depends on the phenotyping

and genotyping costs.

Note that our procedure is based on the

generalized Wald statistic as described by

Rochon (1998). Liu & Liang (1997) have

proposed the use of the quasi-score statistic for

calculating sample size on correlated data. Their

procedure was then applied to the detection of

familial aggregation in a case-control family

design. We applied this quasi-score procedure to

classical association studies and compared it to

that described in this paper. Both procedures

yielded similar results (data not shown). All the

results presented in the present paper were

obtained using the ‘model-based’ variance EE

estimate assuming an equicorrelation structure

between sibs, and not the ‘robust’ variance

estimate. This means that the sample size

calculations given in this paper are exactly those

obtained from a ML analysis using a multi-

normal distribution in equicorrelated sibs (Zhao

et al. 1992). Several studies have shown (Zhao et al.

1992; Feng et al. 1996; Tre! goue$ t et al.1997) that

the power of the ML method and the EE method

using the ‘robust’ variance estimate are gen-

erally similar, suggesting that the sample size

calculations described here are expected to be

valid also when the statistical analysis is per-

formed using the ‘robust’ EE technique. Finally,

since no distributional assumption is required in

the EE procedure, its application to a binary

phenotype is straightforward, by use of a logistic

model instead of a linear one (George et al. 1999;

Tre! goue$ t & Tiret, 2000).

Several limitations have to be addressed.

Firstly, the proposed EE procedure assumes that

sibships are randomly sampled. In the case where

sibships are collected through specific patterns of

trait values, the proposed procedure would not

provide valid results. Secondly, although this

procedure could be applied to more complex

family data such as large pedigrees, analytical

calculations would not be straightforward. Simi-

larly, extensions to multiallelic marker and to

the detection of gene¬environment and

gene¬gene interactions, although possible,

would be quite demanding to implement.

For illustrative purposes, we applied our EE

procedure to investigate the relationship between

the S447X polymorphism of the LPL gene and

plasma TG levels in a sample of sibships. It was

Sample sizes for quantitative traits 311

shown, by classical association analysis, that the

X447 allele was associated with lower TG levels,

as previously reported. Our sample was large

enough to confirm, by both TDT-type methods,

that the association was actually due to linkage

and not to uncontrolled stratification. The fact

that the TDT tests failed to reach significance in

girls was due to a lower heritability than for

boys.

In conclusion, we propose a flexible EE

procedure for determining sample size in classical

association and TDT-type analyses using family

data. It can be used as a preliminary step before

collecting phenotypic and genotypic data when

investigating the role of a candidate gene in the

etiology of a quantitative phenotype.

The Stanislas cohort is supported regularly byBeckman Instruments (U. S.),Biome! rieux (France),Johnson and Johnson (France), Merck (France), andRoche (U.S.). The authors thank the reviewers forproviding helpful comments on the earlier draft of thismanuscript.

Abecasis, G., Cardon, L. & Cookson, W. (2000). A generaltest of association for quantitative traits in nuclearfamilies. Am. J. Hum. Genet. 66, 279–292.

Allison, D. B. (1997). Transmission-disequilibrium testsfor quantitative traits. Am. J. Hum. Genet. 60,676–690.

Allison, D., Heo, M., Kaplan, N. & Martin, E. (1999).Sibling-based tests of linkage and association forquantitative traits. Am. J. Hum. Genet. 64, 1754–1764.

Cardon L. R. (2000). A sib-pair regression model oflinkage disequilibrium for quantitative traits. Hum.Hered. 50, 350–358.

Cleeves, M., Olson, J. & Jacobs, K. (1997). Exacttransmission-disequilibrium tests with multiallelicmarkers. Genet. Epidemiol. 14, 337–347.

Collins, F., Guyer, M. & Chakravarti, A. (1997).Variations on a theme: cataloging human DNAsequence variation. Science 278, 1580–1581.

Feng, Z., McLerran, D., & Grizzle, J. (1996). A com-parison of statistical methods for clustered dataanalysis with gaussian error. Stat. Med. 15, 1793–1806.

Gagne! , E., Genest, J., Zhang, H., Clarke, L. &Hayden,M. (1994). Analysis of DNA changes in the LPL genein patients with familial combined hyperlipidemia.Arterioscler. Thromb. 14, 1250–1257.

Gagne! , S. E., Larson, M. G., Pimstone, S. N., Schaefer,E. J., Kastelein, J. J., Wilson, P. W., Ordovas, J. M. &Hayden, M. R. (1999). A common truncation variantof lipoprotein lipase (Ser447X) confers protectionagainst coronary heart disease : the FraminghamOffspring Study. Clin Genet. 55, 450–454.

Garenc, C., Perusse, L., Gagnon, J., Chagnon, Y.,Bergeron, J., Despres, J., Province, M., Leon,A.,Skinner, J., Wilmore, J., Rao, D. & Bouchard, C.(2000). Linkage and association studies of the lipo-protein lipase gene with postheparin plasma lipaseactivities, body fat, and plasmalipid and lipoproteinconcentrations: the HERITAGE Family Study.Metabolism 49, 432–439.

George, V., Tiwari, H. K., Zhu, X. & Elston, R. C.(1999). A test of transmissiondisequilibrium forquantitative traits in pedigree data, by multipleregression. Am. J. Hum. Genet. 65, 236–245.

Humphries, S. E, Nicaud V., Margalef, J., Tiret, L.,Talmud, P. J., for the EARS. (1998). Lipoproteinlipase gene variation is associated with a paternalhistory of premature coronary artery disease andfasting and postprandial plasma triglycerides. TheEuropean Atherosclerosis Research Study (EARS).Arterioscler. Thromb. Vasc. Biol. 18, 526–534.

Jemaa, R., Fumeron, F., Poirier, O., Lecerf, L., Evans,A., Arveiler, D., Luc, G., Cambou, J., Bard, J.,Fruchart, J., Apfelbaum, M., Cambien, F. & Tiret, L.(1995). Lipoprotein lipase gene polymorphisms:associations with myocardial infarction and lipopro-tein levels, the ECTIM study. J. Lipid Res. 36,2141–2146.

Knapp, M. (1999). The transmissiondisequilibrium test(TDT) and parental genotype reconstruction: thereconstruction-combined transmissiondisequilibriumtest. Am. J. Hum. Genet. 64, 961–870.

Lander, E. (1996). The new genomics: global views ofbiology. Science 274, 536–539.

Lander, E. S., Schork, N. J. (1994). Genetic dissection ofcomplex traits. Science 265, 2037–2048.

Liang, K. Y. & Zeger, S. L. (1986). Longitudinal dataanalysis using generalized linear models. Biometrika73, 13–22.

Lindpaintner, K., Lee, K., Larson, M. G., Rao,V. S.,Pfeffer, M. A., Ordovas, J. M., Schaefer, E. J. et al.(1996). Absence of association or genetic linkagebetween the angiotensin-converting-enzyme gene andleft ventricular mass. N. Engl. J. Med. 334, 1023–1028.

Liu, G. & Liang, K. Y. (1997). Sample size calculationsfor studies with correlated observations. Biometrics.53, 937–947.

Martin, E., Kaplan, N. & Weir, B. (1997). Tests forlinkage and association in nuclear families. Am. J.Hum. Genet. 61, 439–448.

Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N.L. (2000). A test for linkage and association in generalpedigrees : the pedigree disequilibrium test. Am. J.Hum. Genet. 67, 146–154.

Monks, S. & Kaplan, N. (2000). Removing the samplingrestriction from family-based tests of association for aquantitative-trait locus. Am. J. Hum. Genet. 66,576–592.

Nickerson, D. A., Taylor, S. L., Weiss, K. M., Clark, A.G., Hutchinson, R. G., Stengard, J., Salomaa, V.,Vartiainen, E., Boerwinkle, E. & Sing, C. F. (1998).DNA sequence diversity in a 9.7-kb region of thehuman lipoprotein lipase gene. Nat. Genet. 19, 233–240.

Prentice, R. L. & Zhao, L. P. (1991). Estimatingequations for parameters in means and covariances ofmultivariate discrete and continuous responses. Bio-metrics. 47, 825–839.

312 D. A. T

Rabinowitz, D. (1997). A transmission disequilibriumtest for quantitative trait loci. Hum. Hered. 47,342–350.

Rabinowitz, D. & Laird, N. (2000). A unified approach toadjusting association tests for population admixturewith arbitrary pedigree structure and arbitrary missingmarker information. Hum. Hered. 50, 211–223.

Risch, N. & Merikangas, K. (1996). The future of geneticsstudies of complex diseases. Science 273, 1516–1517.

Rochon, J. (1998). Application of GEE procedures forsample size calculations in repeated measures experi-ments. Stat Med. 17, 1643–1658.

Rotnitzky, A. & Jewell, N. P. (1990). Hypothesis testingof regression parameters in semiparametric generalizedlinear models for cluster correlated data. Biometrika.77, 485–497.

Schaid, D. J. & Sommer, S. S. (1994). Comparison ofstatistics for candidate-gene association studies usingcases and parents. Am. J. Hum. Genet. 55, 402–409.

Siest, G., Lecomte, E., Visvikis, S., Herbeth, B.,Gueguen, R., Vincent-Viry, M., Steinmetz, J., Beaud,B., Locuty, J. & Chevrier, P. (1997). Une e! tudefamiliale et longitudinale au Centre de Me! decinePre! ventive de Nancy-Vandoeuvre: la cohorteSTANISLAS. In: Galteau M. M., Delwaide P., SiestG., Henny J., Eds. Biologie Prospective. Compterendus du IXe Colloque International de Pont a'Mousson, Eurobiologie, 29 Septembre-3 Octobre 1996,Paris : J. Libbey Eurotext Publishers ; 1997: 163–166.

Spielman, R. & Ewens, W. (1998). A sibship test forlinkage in the presence of association: the sibtransmissiondisequilibrium test. Am. J. Hum. Genet.62, 450–458.

Spielman, R. S., Mc Ginnis, R. E. & Ewens, W. J. (1993).Transmission test for linkage disequilibrium: theinsulin gene region and insulin-dependent diabetesmellitus (IDDM). Am. J. Hum. Genet. 52, 506–516.

Tre! goue$ t, D. A., Ducimetiere, P. & Tiret, L. (1997).Testing association between candidate-gene markers

and phenotype in related individuals, by use ofestimating equations. Am. J. Hum. Genet. 61, 189–99.

Tre! goue$ t, D. A., Herbeth, B., Juhan-Vague, I., Siest, G.,Ducimetie' re, P. & Tiret, L. (1999). Bivariate familialcorrelation analysis of quantitative traits by use ofestimating equations: applications to a familial analy-sis of the insulin resistance syndrome. Genet. Epidemiol16, 69–83.

Tre! goue$ t, D. A. & Tiret, L. (2000). Applications of theestimating equations theory to genetic epidemiology:a review. Ann. Hum. Genet. 64, 1–14.

Villard, E., Tiret, L., Visvikis, S., Rakotovao, R.,Cambien, F. & Soubrier, F. (1996). Identification ofnew polymorphisms of angiotensin I-converting en-zyme (ACE) gene, and study of their relationship toplasma ACE levels by two-QTL segregation-linkageanalysis. Am. J. Hum. Genet. 58, 1268–1278.

Wilson, D. E., Hata, A., Kwong, L. K., Lingam, A.,Shuhua, J., Ridinger, D. N., Yeager, C., Kaltenborn,K. C., Iverius, P. H. & Lalouel, J. M. (1993). Mutationsin exon 3 of the lipoprotein lipase gene segregating ina family with hypertriglyceridemia, pancreatitis, andnon-insulin-dependent diabetes. J. Clin. Invest. 92,203–211.

Yang, Q., Rabinowitz, D., Isasi, C. & Shea, S. (2000).Adjusting for confounding due to population admix-ture when estimating the effect of candidate genes onquantitative traits. Hum. Hered. 50, 227–233.

Zhang, Q., Cavanna, J., Winkelman, B. R., Shine, B.,Gross, W., Marz, W. & Galton, D. J. (1995). Commongenetic variants of lipoprotein lipase that relate tolipid transport in patients with premature coronaryartery disease. Clin. Genet. 48, 293–298.

Zhao, L., Prentice, R. & Self, S. (1992). Multivariatemean parameter estimation by using a partlyexponential model. J. R. Stat. Soc. [B]. 54, 805–811.

Zhu, X. & Elston, R. (2001). Transmissiondisequilibrium tests for quantitative traits. Genet.Epidemiol. 20, 57–74.