37
Multilevel logistic regression with outcome uncertainty Leo Bastos Scientific Computing Program (PROCC) Oswaldo Cruz Foundation (Fiocruz) Rio de Janeiro, Brazil Funding: CNPq and FAPERJ April 8, 2014 Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 1 / 37

Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Embed Size (px)

Citation preview

Page 1: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression with outcome uncertainty

Leo Bastos

Scientific Computing Program (PROCC)Oswaldo Cruz Foundation (Fiocruz)

Rio de Janeiro, Brazil

Funding: CNPq and FAPERJ

April 8, 2014

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 1 / 37

Page 2: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 2 / 37

Page 3: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 3 / 37

Page 4: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 4 / 37

Page 5: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Motivation

Our group study hard-to-reach populations, such as MSM, FSW,heavy drug users, crack users, etc.

How many are they?

Obtain indirect information from the general population.

Who are they?

Sampling directly from these populations

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 5 / 37

Page 6: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

How many are they?

Sample from general population

Ask ‘how many Xs do the person know?’ from several populations.

Network degree can be estimated

Prevalence for hard-to-reach populations can be estimated

Dealing with complex samples (?) {Si, Patel, and Gelman, 2015}

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 6 / 37

Page 7: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Who are they?

Usual sampling methods cannot be used;

Alternative sampling methodsRespondent-driven sampling (RDS)

Chain-referral sampling; Network degree reported.Estabilished point estimatorsThere is no model-based approach for RDS

Time-location sampling (TLS)

Complex probabilistic sampleRequire previous knowlegment about population behaviourHierarchical modelling can be done

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 7 / 37

Page 8: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Motivation

In epidemiology, Logistic regression is a standard tool;

Coefficients give directly interpretable measures of risk;

Usually applied into survey data;

HoweverData may lack accuracy

Rapid disease tests;Self response;

Complex samples (cluster and/or stratify samples)Natural “levels” in the study

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 8 / 37

Page 9: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 9 / 37

Page 10: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

National crack users survey

Crack cocaine is the freebase form of cocaine that can be smoked.

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 10 / 37

Page 11: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

National crack users survey

The Brazilian crack users survey is the largest survey with crack users(n = 7381 users in crack scenes)

SENAD/MJ (Ministry of Justice) and Fiocruz

Complex sample for a hard-to-reach population

41 geographical strata: capitals (26+1), metropolitan regions (9), restof brasil by region (5)Time-location sampling

Time: days of week (7) and turns (3)Location: Crack scenes previously mappedUsers: Inverse probability scheme

Data

Socio-demographic and risk behaviour for infectious diseaseRapid tests for HIV, hepa C, and TBCrime, access to health system, drug use behaviour, etc.

The first results were published in a book freely available.

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 11 / 37

Page 12: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

National crack users survey

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 12 / 37

Page 13: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 13 / 37

Page 14: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression with outcome uncertainty

Yi = {0, 1} is a test result of a disease for patient i .

Zi = {0, 1} is the true disease status (unknown).

γs and γe are the sensitivity and specificity of the diagnostic test

Hence, the probability of a positive outcome is given by

P(Yi = 1) = πi = θiγs + (1− θi )(1− γe )

where θi is the probability of the patient i truly has the disease.

We observe Yi , but are interested to infer about Zi , i.e. θi

usually, there is some reliable information about the pair (γs , γe)

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 14 / 37

Page 15: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression with outcome uncertainty

The model can be represented as the following

Yi ∼ Bernoulli(πi )

πi = θiγs + (1− θi )(1− γe )

logit(θi ) = αj[i ] + xTi βj[i ]

Using Gelman and Hill (2007) multilevel notation.

Outcome uncertainty: Madger and Hughes (1997), McInturff et al.(2004)

Depending on group-level structure, the associated distribution cancontain

Independence between groups (iid)Temporal dependence (RW, ARMA, DLM)Spatial dependence (CAR)You name it...

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 15 / 37

Page 16: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 16 / 37

Page 17: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Inference, priors and implementation

Frenquentist solution

For the classical model and, sensitivity and specificity both known, vanden Hout et al. (2007) propose a simple implementation based onGLM.Analogously, a multilevel model can be adapted. (iid case works onlme4)

Bayesian approach

Key feature of Bayesian approach are the priors.Gelman et al. (2008) proposed weakly informative priors for logisticmodels.Fong et al. (2010) described how to elicit weakly informative priors forrandom effects.The problem lies on the specificity and sensitivity parameters. (Fox etal. 2005, Chu et al. 2006)

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 17 / 37

Page 18: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Inference, priors and implementation

Too sensisitive of specificity and sensitivity parameters.

Either specify the values or set (very) informative priors

γs ∼ Beta(aγs , bγs ), γe ∼ Beta(aγe , bγe )

On the bright side, we have information about γs and γeElicitation:

Fix a pontual prior estimate m∗ = E[γ] = a/(a + b);Fix a prior sample size, n∗ = a + b;Find (a, b) in terms of (m∗, n∗).

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 18 / 37

Page 19: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Inference

All inference is based on the posterior distribution

p(α,β, γs , γe , ψ | Y, x) (1)

Inference methodSampling via MCMC (Gelfand & Smith, 1990)

Numerical approximation for the joint posterior;Computationally intensive;BUGS and Stan may help;

Approximate posterior marginals via INLA (Rue et al., 2009).

INLA is fast and accurate;Feasible to apply to large data sets;

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 19 / 37

Page 20: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

INLA in a nutshell

Integrated nested Laplace approximation (INLA)

Works in a class of latent Gaussian models. [y |θ,φ]

Let the posterior distribution be π(θ,φ|y)

Posterior marginal distribution

π(θi |y) =

∫π(θi |φ, y)π(φ|y)dφ

π(φk |y) =

∫π(φ|y)dφ−k

Approximated by

π(θi |y) ≈∫π̃(θi |φ, y)π̃(φ|y)dφ

π(φk |y) ≈∫π̃(φ|y)dφ−k

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 20 / 37

Page 21: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

INLA details

Sensitivity-specificity model has been implemented (with Havard Ruesupport)

Bernouli problem in INLA (Ferkingstad & Rue, 2015, March)

WAIC has been implemented, but not used here.

Benchmark with MCMC to be done via simulation studies.

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 21 / 37

Page 22: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 22 / 37

Page 23: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 23 / 37

Page 24: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Detention among crack users in Rio de Janeiro

Find risk factors for detention (in the past 12 months) among crackusers in Rio de Janeiro

“Have you been detained (stayed less than one day in a police station)in the past year? (Yes, No)”

Explanatory variables

Poli user: {yes, no}Rehab: {yes, no}Illicit money for obtain drugs: {yes, no}Homeless: {Yes; No}Years of study: {<8, ≥8}Race: {White, Black, “Pardo”, others}Gender: {male, female}Age: {< 31, 31+}

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 24 / 37

Page 25: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Detention among crack users in Rio de Janeiro

The design effect was included in the model throughout randomeffects for strata (capital or MR) and for TLS level (crack scene andday/turn)

Multilevel model assumptions

The time and location clusters were chosen according to a simplerandom samplingThe sampling design is ignorable at user levels

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 25 / 37

Page 26: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression

The complete model is the following

Deti ∼ Bernoulli(

thetai )

logit(θi ) = αgeo[i ] + α∗tls[i ] + xiβ

Priors (weakly informative)

Coefficients: Cauchy(0, 2.5) (Gelman et al., 2008)Random effects: Cauchy(0, 10) (Fong et al., 2010)

930 crack users were interviewed in Rio de Janeiro, 544 in the capital,and 386 in the metropolitan region.

17 users were excluded due to missing data (15 in the capital and 2 inthe MR)

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 26 / 37

Page 27: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression model

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 27 / 37

Page 28: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

DIC - Deviance information criterion

DIC

Crude 1139.827Stratum 1136.454

Stratum + TLS 1092.895

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 28 / 37

Page 29: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 29 / 37

Page 30: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

HIV among crack users in Rio de Janeiro

Finding risk factors for HIV among crack users in Rio de Janeiro.

Explanatory variables

Gender: {male, female}Age: {< 31, 31+}Received money or drugs in exchance for sex (last 30 days): {yes, no}

930 crack users were interviewed in Rio de Janeiro, 544 in the capital,and 386 in the MR.

345 users either refuse or had inconclusive test results. (excluded)Status Capital MR

HIV- 377 178HIV+ 22 8

Total 399 186

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 30 / 37

Page 31: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

HIV among crack users in Rio de Janeiro

The design effect was included in the model throughout randomeffects for strata (capital or MR) and for TLS level (crack scene andday/turn)

The effect of ”received money or drugs in exchance for sex (last 30days)” may vary between capital and MR, a random coefficient wasconsidered.

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 31 / 37

Page 32: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression with outcome uncertainty

The complete model is the following

HIVi ∼ Bernoulli(πi )

πi = θiγs + (1− θi )(1− γe )

logit(θi ) = αgeo[i ] + α∗tls[i ] + ηi

ηi = β1GenderFem + β2Age31p + βgeo[i ]Money4SexYes

γs = 0.9999 and γe = 0.989 from the Rapid HIV test instructions.

Priors (weakly informative)

Coefficients: Cauchy(0, 2.5)Random effects: Cauchy(0, 10)

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 32 / 37

Page 33: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Logistic regression model

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 33 / 37

Page 34: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Multilevel logistic regression model

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 34 / 37

Page 35: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

DIC - Deviance information criterion

DIC

LR MLR

Crude 166.97 168.27Design 157.03 148.77

SS + Design 156.38 150.40

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 35 / 37

Page 36: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Outline

1 MotivationHard-to-reach populationsNational crack users survey

2 ModelInference, priors and implementation

3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro

4 Working in progress

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 36 / 37

Page 37: Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with outcome uncertainty Leo Bastos Scienti c Computing Program (PROCC) Oswaldo Cruz Foundation

Work in progress

Multilevel modeling allows to include the study design in the model

Outcome uncertainty can also be delt with

Already implemented in INLA testing version

> inla(model.equation, family="testbinomial1",...)

Benchmark against standard MCMC implemented in stan (Hoffman &Gelman, 2012);

Estimates for γs and γe depend heavily on prior choice, thusdemanding more detailed prior sensitivity assessment;

Investigate performance under various scenarios with a comprehensivesimulation study.

Methodological paper in progress...

Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 37 / 37