Download pdf - Inference Methods for the Conditional Logistic Regression Model … · 2010-10-28 · Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising

Inference Methods for the Conditional LogisticRegression Model with Longitudinal Data

Arising from Animal Habitat Selection Studies

Thierry Duchesne∗ 1

([email protected])

with Radu Craiu∗∗, Daniel Fortin∗∗∗, Sophie Baillargeon∗

∗Departement de mathematiques et de statistique, Universite Laval∗∗Department of Statistics, University of Toronto∗∗∗Departement de biologie, Universite Laval

Department of Statistics SeminarUniversity of Manitoba

October 28, 2010

1Research funded by NSERC.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Outline

1 IntroductionResearch objectivesSampling designsData availableMethodological objectives

2 Conditional logistic regressionModel and notationJustification of conditional logistic regression

3 Population averaged inferenceMethodExample of application

4 Subject specific inferenceMethodExample of application

5 Conclusion

6 References


Research objectives

Objectives of our research

Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).

Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?


Research objectives

Objectives of our research

Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).

Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?


Sampling designs

Possible study designs

Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).

Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.


Sampling designs


Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.

If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.


Sampling designs


Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.


Sampling designs


Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.

The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.


Sampling designs


Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.

Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.


Sampling designs


Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.


Sampling designs

Matched design

E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.

Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330


Sampling designs

Matched design

E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.

Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330


Data available

Part I: Data on the available location

We have a detailed GIS database of Prince Albert NationalPark


Data available

Part II: Animal location data

For each of K animals (female bison), GPS collars give theirprecise location at a large number of equally spaced time steps


Methodological objectives

Our precise statistical problems

In some cases, we can get more than one Y = 1 in astratum: e.g., a pair of animals traveling together.How do we make inferences on the preferences of theanimals for given landscape attributes under such asampling design?We will see that this can be done if we can come up with a“longitudinal” version of conditional logistic regression.


Model and notation

Notation

Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;

Response variable: y(c)ji = 1 if animal c was at location i in

j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)

>;Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.


Model and notation

Notation



j-th stratum, 0 otherwise;

Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)


ni=1 y(c)

ji = m for all j,c.


Model and notation

Notation




ji = (x(c)ji1, . . . ,x

(c)jip)

>;

Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.


Model and notation

Notation




ji = (x(c)ji1, . . . ,x

(c)jip)


ni=1 y(c)

ji = m for all j,c.


Model and notation

“Prospective” model

If we sampled locations without knowing the value of the y(c)ji in

advance (i.e., prospective study), we could link landscapeattributes x(c)

ji with y(c)ji using logistic regression-type models.

E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)

ji areindependent with

Pr[y(c)

ji = 1∣∣∣bc,x

(c)ji

]=

exp(

β>x(c)

ji +b>c z(c)ji

)1+ exp

(β>x(c)

ji +b>c z(c)ji

) .


Model and notation

“Prospective” model

If we sampled locations without knowing the value of the y(c)ji in

advance (i.e., prospective study), we could link landscapeattributes x(c)

ji with y(c)ji using logistic regression-type models.

E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)

ji areindependent with

Pr[y(c)

ji = 1∣∣∣bc,x

(c)ji

]=

exp(

β>x(c)

ji +b>c z(c)ji

)1+ exp

(β>x(c)

ji +b>c z(c)ji

) .


Model and notation

Resource selection function

The exponential of the linear predictor is sometimes calledresource selection function (RSF). Maps of its value can help toassess animal preferences.


Justification of conditional logistic regression

“Retrospective” model

When location i in stratum j of animal c is sampled on the basisof its y(c)

ji value, how can we infer about β (and possibly Σ) in theprospective model?

Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.



“Retrospective” model

When location i in stratum j of animal c is sampled on the basisof its y(c)

ji value, how can we infer about β (and possibly Σ) in theprospective model?

Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.



Conditional likelihood

If we suppose that b>c z(c)ji = bc in the prospective model, then we

get that

Pr

[y(c)

ji , i = 1, . . . ,n

∣∣∣∣∣bc,n

∑i=1

y(c)ji = m,x(c)

ji , i = 1, . . . ,n

]=

exp(

∑ni=1 β

>x(c)ji y(c)

ji

)∑

(nm)

`=1 exp(

∑ni=1 β

>x(c)ji v`i

) ,

where the sum at the denominator is over all vectors v`

comprised of zeros and ones such that the sum of theirelements is m.



Exponential movement kernels (Forester et al 2009)

Suppose the animal is at location a at time step t.All locations in set Da are reachable by the animal untiltime step t +1.Assume that the density of movement from a point a to apoint b in a homogeneous baseline landscape over onetime step is given by φ(dab), where dab is the distancebetween a and b.Suppose that habitat characteristics have a log-lineareffect on the movement kernel. Then

f (b|a,xs,s ∈Da) =φ(dab)exp(β>xb)∫

s∈Daφ(das)exp(β>xs)

.



Exponential movement kernels (Forester et al 2009)

Evaluation of the integral at the denominator can be replacedby an approximating sum. Forester et al (2009) show that if asample Sa comprised of b and n−1 other locations in Da are“appropriately” sampled,

f (b|a,x`, ` ∈Da) =φ(dab)exp(β>xb)∫

`∈Daφ(da`)exp(β>x`)

≈ exp(β>xb)

∑`∈Sa exp(β>x`),

which is the probability of conditional logistic regression whenm = 1 and the location with y = 1 is b.


Method

Data and assumptions

Now back to the general problem:K animals, S(c) strata observed for animal c, m “cases”(locations with y = 1) and n−m “controls” (locations withy = 0) in each stratum.We want to make population averaged inference about β inthe prospective model.It is assumed that the data can be partitioned intouncorrelated clusters (data from different animalsuncorrelated, or clusters of observations on a same animaltaken several time units apart).


Method

Craiu et al (2008)

We showed that the likelihood score function of theretrospective model can be rewritten as

U(β ) =K

∑c=1

S(c)

∑j=1

n

∑i=2

x(c)∗ji y(c)

ji −∑

(mn)

`=1 v`ix(c)∗ji exp

(∑

nh=2 β

>x(c)∗ji v`h

)∑

(mn)

`=1 exp(

∑nh=2 β

>x(c)∗ji v`h

)

=K

∑c=1

D(c)>(

V(c)Indep

)−1 {Y(c)− µ(β )

},

where x(c)∗ji = x(c)

ji −x(c)j1 and Y(c) is the vector of all responses,

but without the y(c)j1 ’s and µ(β ) = ERetro.[Y(c)|X(c)].


Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.

U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:


Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.

Simulations have shown that inferences are good in finitesamples:


Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:


Method

Simulation results, Craiu et al (2008, Table 1)


Method

Disadvantages

Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.

Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:


Method

Disadvantages

Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:


Method

Simulation results


Example of application

Application to female bison in Prince Albert

8 female bison with 14 clusters of 48 locations, and 1female with 9 clusters, all followed between 2 Sept. 2005 –2 Dec. 2005Each observed location was matched to 10 locationspicked at random in a 300 m buffer (soK = 8×14+1×9 = 121, S = 48, m = 1, n = 11).x: 6 dummy variables to quantify seven-level habitat classcategorical variable (deciduous stands = baseline level)



Model fit


Method

Conditional inference

Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?

Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)


Method

Conditional inference

Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?

Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)


Method

Likelihood for the general case

Craiu et al (2011) get the following likelihood in the generalcase:

L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si

)d(c)(β ,b) dF(b;Σ)∫

d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),

where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)

si )}−1.

How do you maximize this thing?!?!?!?!!


Method



L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si


d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),


si )}−1.



Method



L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si


d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),


si )}−1.



Method

Maximization of the likelihood

Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.

Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize

With small K and large S, these methods are painfully slow andunstable!


Method


Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.

Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize



Method


Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize



Method


Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize



Method

Two-step algorithm, Craiu et al (2011)

Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:

Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).

Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.


Method

Two-step algorithm, Craiu et al (2011)

Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:

Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.


Method

Second step: REML with EM

Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,

Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).

Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,

where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.


Method



Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).

Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,



Method




and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.

Consider the linear mixed model

U = W1β +W2φ + ε,



Method




and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,



Method


β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).

We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.

An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!


Method


β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.

We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.



Method


β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.



Method


β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.



Method

Simulation results, Craiu et al (2011, Fig. 1)



Application to female bison in Prince Albert

20 pairs of two female bison followed between 15 Nov. –15 April, 2005, 2006, 2007Each pair of observed locations was matched to 10locations picked at random in a 700 m buffer (so K = 20,m = 2, n = 12, S varied between 21 and 349).x: dummy variables to quantify habitat class as well as anabove-ground vegetation biomass index (in kg/m2)



Model fit


Future research

How should the “controls” be sampled?

Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?


Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?

Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?


Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?

Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?


Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?


References

Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of themixed multinomial logit model, Transport. Res. Part B, 35, 677–93.

Craiu, R. V., Duchesne, T., Fortin, D. (2008). Inference methods for the condi-tional logistic regression model with longitudinal data., Biometrical J., 50,97–109.

Craiu, R. V., Duchesne, T., Fortin, D., Baillargeon, S. (2011). Conditional logisticregression with longitudinal follow up and individual-level random coefficients: Astable and efficient two-step estimation method, J. of Comput. & Graph. Statist,to appear.

Forester, J. D., Im, H. K., Rathouz, P. J. (2009). Accounting for animal movementin estimation of resource selection functions: sampling and data analysis,Ecology, 90, 3554-65.

Pfeiffer, R. M., Gail, M. H., Pee, D. (2001). Inference for covariates that accountsfor ascertainment and random genetic effects in family studies, Biometrika, 88,933–48.

Train, K. (2003). Discrete choice methods with simulation, New York: CambridgeUniversity Press.