Inference Methods for the Conditional LogisticRegression Model with Longitudinal Data
Arising from Animal Habitat Selection Studies
Thierry Duchesne∗ 1
with Radu Craiu∗∗, Daniel Fortin∗∗∗, Sophie Baillargeon∗
∗Departement de mathematiques et de statistique, Universite Laval∗∗Department of Statistics, University of Toronto∗∗∗Departement de biologie, Universite Laval
Department of Statistics SeminarUniversity of Manitoba
October 28, 2010
1Research funded by NSERC.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Outline
1 IntroductionResearch objectivesSampling designsData availableMethodological objectives
2 Conditional logistic regressionModel and notationJustification of conditional logistic regression
3 Population averaged inferenceMethodExample of application
4 Subject specific inferenceMethodExample of application
5 Conclusion
6 References
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Research objectives
Objectives of our research
Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).
Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Research objectives
Objectives of our research
Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).
Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).
Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.
If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.
The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.
Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Possible study designs
Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Matched design
E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.
Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Sampling designs
Matched design
E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.
Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Data available
Part I: Data on the available location
We have a detailed GIS database of Prince Albert NationalPark
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Data available
Part II: Animal location data
For each of K animals (female bison), GPS collars give theirprecise location at a large number of equally spaced time steps
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Methodological objectives
Our precise statistical problems
In some cases, we can get more than one Y = 1 in astratum: e.g., a pair of animals traveling together.How do we make inferences on the preferences of theanimals for given landscape attributes under such asampling design?We will see that this can be done if we can come up with a“longitudinal” version of conditional logistic regression.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
Notation
Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;
Response variable: y(c)ji = 1 if animal c was at location i in
j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)
ji = (x(c)ji1, . . . ,x
(c)jip)
>;Sampling design: By design, it is known before samplingthat ∑
ni=1 y(c)
ji = m for all j,c.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
Notation
Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;
Response variable: y(c)ji = 1 if animal c was at location i in
j-th stratum, 0 otherwise;
Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)
ji = (x(c)ji1, . . . ,x
(c)jip)
>;Sampling design: By design, it is known before samplingthat ∑
ni=1 y(c)
ji = m for all j,c.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
Notation
Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;
Response variable: y(c)ji = 1 if animal c was at location i in
j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)
ji = (x(c)ji1, . . . ,x
(c)jip)
>;
Sampling design: By design, it is known before samplingthat ∑
ni=1 y(c)
ji = m for all j,c.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
Notation
Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;
Response variable: y(c)ji = 1 if animal c was at location i in
j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)
ji = (x(c)ji1, . . . ,x
(c)jip)
>;Sampling design: By design, it is known before samplingthat ∑
ni=1 y(c)
ji = m for all j,c.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
“Prospective” model
If we sampled locations without knowing the value of the y(c)ji in
advance (i.e., prospective study), we could link landscapeattributes x(c)
ji with y(c)ji using logistic regression-type models.
E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)
ji areindependent with
Pr[y(c)
ji = 1∣∣∣bc,x
(c)ji
]=
exp(
β>x(c)
ji +b>c z(c)ji
)1+ exp
(β>x(c)
ji +b>c z(c)ji
) .
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
“Prospective” model
If we sampled locations without knowing the value of the y(c)ji in
advance (i.e., prospective study), we could link landscapeattributes x(c)
ji with y(c)ji using logistic regression-type models.
E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)
ji areindependent with
Pr[y(c)
ji = 1∣∣∣bc,x
(c)ji
]=
exp(
β>x(c)
ji +b>c z(c)ji
)1+ exp
(β>x(c)
ji +b>c z(c)ji
) .
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Model and notation
Resource selection function
The exponential of the linear predictor is sometimes calledresource selection function (RSF). Maps of its value can help toassess animal preferences.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Justification of conditional logistic regression
“Retrospective” model
When location i in stratum j of animal c is sampled on the basisof its y(c)
ji value, how can we infer about β (and possibly Σ) in theprospective model?
Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Justification of conditional logistic regression
“Retrospective” model
When location i in stratum j of animal c is sampled on the basisof its y(c)
ji value, how can we infer about β (and possibly Σ) in theprospective model?
Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Justification of conditional logistic regression
Conditional likelihood
If we suppose that b>c z(c)ji = bc in the prospective model, then we
get that
Pr
[y(c)
ji , i = 1, . . . ,n
∣∣∣∣∣bc,n
∑i=1
y(c)ji = m,x(c)
ji , i = 1, . . . ,n
]=
exp(
∑ni=1 β
>x(c)ji y(c)
ji
)∑
(nm)
`=1 exp(
∑ni=1 β
>x(c)ji v`i
) ,
where the sum at the denominator is over all vectors v`
comprised of zeros and ones such that the sum of theirelements is m.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Justification of conditional logistic regression
Exponential movement kernels (Forester et al 2009)
Suppose the animal is at location a at time step t.All locations in set Da are reachable by the animal untiltime step t +1.Assume that the density of movement from a point a to apoint b in a homogeneous baseline landscape over onetime step is given by φ(dab), where dab is the distancebetween a and b.Suppose that habitat characteristics have a log-lineareffect on the movement kernel. Then
f (b|a,xs,s ∈Da) =φ(dab)exp(β>xb)∫
s∈Daφ(das)exp(β>xs)
.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Justification of conditional logistic regression
Exponential movement kernels (Forester et al 2009)
Evaluation of the integral at the denominator can be replacedby an approximating sum. Forester et al (2009) show that if asample Sa comprised of b and n−1 other locations in Da are“appropriately” sampled,
f (b|a,x`, ` ∈Da) =φ(dab)exp(β>xb)∫
`∈Daφ(da`)exp(β>x`)
≈ exp(β>xb)
∑`∈Sa exp(β>x`),
which is the probability of conditional logistic regression whenm = 1 and the location with y = 1 is b.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Data and assumptions
Now back to the general problem:K animals, S(c) strata observed for animal c, m “cases”(locations with y = 1) and n−m “controls” (locations withy = 0) in each stratum.We want to make population averaged inference about β inthe prospective model.It is assumed that the data can be partitioned intouncorrelated clusters (data from different animalsuncorrelated, or clusters of observations on a same animaltaken several time units apart).
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Craiu et al (2008)
We showed that the likelihood score function of theretrospective model can be rewritten as
U(β ) =K
∑c=1
S(c)
∑j=1
n
∑i=2
x(c)∗ji y(c)
ji −∑
(mn)
`=1 v`ix(c)∗ji exp
(∑
nh=2 β
>x(c)∗ji v`h
)∑
(mn)
`=1 exp(
∑nh=2 β
>x(c)∗ji v`h
)
=K
∑c=1
D(c)>(
V(c)Indep
)−1 {Y(c)− µ(β )
},
where x(c)∗ji = x(c)
ji −x(c)j1 and Y(c) is the vector of all responses,
but without the y(c)j1 ’s and µ(β ) = ERetro.[Y(c)|X(c)].
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Advantages
With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.
U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Advantages
With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.
Simulations have shown that inferences are good in finitesamples:
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Advantages
With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Simulation results, Craiu et al (2008, Table 1)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Disadvantages
Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.
Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Disadvantages
Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Simulation results
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Example of application
Application to female bison in Prince Albert
8 female bison with 14 clusters of 48 locations, and 1female with 9 clusters, all followed between 2 Sept. 2005 –2 Dec. 2005Each observed location was matched to 10 locationspicked at random in a 300 m buffer (soK = 8×14+1×9 = 121, S = 48, m = 1, n = 11).x: 6 dummy variables to quantify seven-level habitat classcategorical variable (deciduous stands = baseline level)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Example of application
Model fit
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Conditional inference
Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?
Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Conditional inference
Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?
Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Likelihood for the general case
Craiu et al (2011) get the following likelihood in the generalcase:
L(β ,Σ) =K
∏c=1
exp(
∑si y(c)si β
>x(c)si
)∫exp
(∑si y(c)
si b>z(c)si
)d(c)(β ,b) dF(b;Σ)∫
d(c)(β ,b)∏s ∑`∈L
(c)s
exp{
∑i v(c)`si (β
>x(c)si +b>z(c)
si )}
dF(b;Σ),
where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)
si )}−1.
How do you maximize this thing?!?!?!?!!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Likelihood for the general case
Craiu et al (2011) get the following likelihood in the generalcase:
L(β ,Σ) =K
∏c=1
exp(
∑si y(c)si β
>x(c)si
)∫exp
(∑si y(c)
si b>z(c)si
)d(c)(β ,b) dF(b;Σ)∫
d(c)(β ,b)∏s ∑`∈L
(c)s
exp{
∑i v(c)`si (β
>x(c)si +b>z(c)
si )}
dF(b;Σ),
where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)
si )}−1.
How do you maximize this thing?!?!?!?!!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Likelihood for the general case
Craiu et al (2011) get the following likelihood in the generalcase:
L(β ,Σ) =K
∏c=1
exp(
∑si y(c)si β
>x(c)si
)∫exp
(∑si y(c)
si b>z(c)si
)d(c)(β ,b) dF(b;Σ)∫
d(c)(β ,b)∏s ∑`∈L
(c)s
exp{
∑i v(c)`si (β
>x(c)si +b>z(c)
si )}
dF(b;Σ),
where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)
si )}−1.
How do you maximize this thing?!?!?!?!!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Maximization of the likelihood
Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.
Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize
With small K and large S, these methods are painfully slow andunstable!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Maximization of the likelihood
Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.
Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize
With small K and large S, these methods are painfully slow andunstable!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Maximization of the likelihood
Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize
With small K and large S, these methods are painfully slow andunstable!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Maximization of the likelihood
Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize
With small K and large S, these methods are painfully slow andunstable!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Two-step algorithm, Craiu et al (2011)
Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:
Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).
Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Two-step algorithm, Craiu et al (2011)
Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:
Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,
Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).
Stack the vectors of random effects b1, . . . ,bK in a vector φ
and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model
U = W1β +W2φ + ε,
where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,
Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ
and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).
Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model
U = W1β +W2φ + ε,
where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,
Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ
and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.
Consider the linear mixed model
U = W1β +W2φ + ε,
where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,
Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ
and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model
U = W1β +W2φ + ε,
where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).
We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.
An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.
We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.
An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.
An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Second step: REML with EM
β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.
An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Method
Simulation results, Craiu et al (2011, Fig. 1)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Example of application
Application to female bison in Prince Albert
20 pairs of two female bison followed between 15 Nov. –15 April, 2005, 2006, 2007Each pair of observed locations was matched to 10locations picked at random in a 700 m buffer (so K = 20,m = 2, n = 12, S varied between 21 and 349).x: dummy variables to quantify habitat class as well as anabove-ground vegetation biomass index (in kg/m2)
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Example of application
Model fit
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Future research
How should the “controls” be sampled?
Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Future research
How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?
Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Future research
How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?
Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
Future research
How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?
Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s
References
Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of themixed multinomial logit model, Transport. Res. Part B, 35, 677–93.
Craiu, R. V., Duchesne, T., Fortin, D. (2008). Inference methods for the condi-tional logistic regression model with longitudinal data., Biometrical J., 50,97–109.
Craiu, R. V., Duchesne, T., Fortin, D., Baillargeon, S. (2011). Conditional logisticregression with longitudinal follow up and individual-level random coefficients: Astable and efficient two-step estimation method, J. of Comput. & Graph. Statist,to appear.
Forester, J. D., Im, H. K., Rathouz, P. J. (2009). Accounting for animal movementin estimation of resource selection functions: sampling and data analysis,Ecology, 90, 3554-65.
Pfeiffer, R. M., Gail, M. H., Pee, D. (2001). Inference for covariates that accountsfor ascertainment and random genetic effects in family studies, Biometrika, 88,933–48.
Train, K. (2003). Discrete choice methods with simulation, New York: CambridgeUniversity Press.