48
An Introduction to Latent Variable Modeling Karen Bandeen-Roche Qian-Li Xue Johns Hopkins Departments of Biostatistics and Medicine October 27, 2016

An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

An Introduction to Latent Variable Modeling

Karen Bandeen-Roche Qian-Li Xue

Johns Hopkins Departments of

Biostatistics and Medicine

October 27, 2016

Page 2: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

LATENT VARIABLES:

TRUTH, LIES, AND EVERYTHING BETWEEN

Karen Bandeen-Roche Department of Biostatistics Johns Hopkins University

ABACUS Seminar Series November 28, 2007

Page 3: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Objectives •  What is a latent variable (LV)? •  What are some common LV models?

•  What are major features of LV modeling? –  Hierarchical: structural and measurement components –  Fitting –  Evaluating fit –  Predictions –  Identifiability

•  Why should I consider using—or decide against using—LV models?

Page 4: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Part I: Overview

Page 5: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

“LATENT”? 1.  Present or potential but not evident or active: latent talent. 2.  Pathology. In dormant or hidden stage: a latent infection. 3.  Biology. Undeveloped, but capable of normal growth under the

proper conditions: a latent bud. 4.  Psychology. Present and accessible in the unconscious mind,

but not consciously expressed.

The American Heritage Dictionary of English Language, Fourth Edition, 2000

“existing in hidden or dormant form but usually capable of being

brought to light” Merriam-Webster’s Dictionary of Law, 1996

Page 6: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

“LATENT” •  “…concepts in their purest form… unobserved or unmeasured … hypothetical”

Bollen KA, Structural Equations with Latent Variables, p. 11, 1989

•  “…in principle or practice, cannot be observed”

Bartholomew DJ, The Statistical Approach to Social Measurement, p. 12

•  “Underlying: not directly measurable. Existing in hidden form but usually capable of being measured indirectly by observables.”

Bandeen-Roche K, Synthesis, 2006

Page 7: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

“LATENT VARIABLES”?

•  Ordinary linear regression model: Yi = outcome (measured) Xi = covariate vector (measured) εi = residual (unobserved)

Yi = XiT β+ εi

Page 8: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Ordinary Linear Regression Residual as Latent Variable

X

.

.

. . . .

.

. .

. . . .

.

Y . ε Y X ε

Boxes denote observables Ovals denote “unobserved” Straight arrows are causal Curved arrows denote association

1

Yi = XiT β+ εi

Page 9: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Mixed effect / Multi-level models Random effects as Latent Variables

time

vital

non-vital

Yij = β0 + β1 xi + β2 tij + β3 xi·tij + eij

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

0

β0 + β1

β0

β2

β2 + β3

Page 10: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Mixed effect / Multi-level models Random effects as Latent Variables

•  b0i = random intercept b2i = random slope (could define more)

•  Population heterogeneity captured by spread in intercepts, slopes

time

vital

non-vital

Yij = β0 + b0i + β1 xi + β2 tij + b2i tij + β3 xi·tij + eij

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

0

β0 + β1

β0

β2

β2 + β3

+ b0i slope: - |b2i|

Page 11: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Mixed effect / Multi-level models Random effects as Latent Variables

time

vital

non-vital

Yij = β0 + b0i + β1 xi + β2 tij + b2i tij + β3 xi·tij + eij

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

.

.

. .

. .

.

0

β0 + β1

β0

β2

β2 + β3

+ b0i slope: - |b2i|

Y X ε

t b

1

1

Page 12: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable model

Inflammation Mobility

X1

Xp

Y1

YM

δp

δ1 ε1

εM

ξ η

ζ1

1

1

1

1

1

Page 13: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Linear structural equations model with latent variables (LISREL):

Yij = outcome (jth measurement per “person” i) xij = covariate vector (corresponds to jth measurement, person i) λy

j = outcome “loading” (relates outcome LV to Y measurement) ηi = latent outcome=random coefficient vector, person i λx

j= covariate "loading" (relates covariate LV to jth x measurement) ξi = latent covariate = random coefficient vector, person i εij = observed response residual δij = observed covariate residual ςi = latent response residual vector (specified distribution)

Yij = λy

jTηi + εij

Xij = λjXTξi + δij

ηi = Bηi + Γξi + ςi

> My sense: It’s the unknown λj that distinguishes above as a

“latent variable model” in most minds

"LATENT VARIABLES”?

Page 14: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent Variables: What? Integrands in a hierarchical model

•  Observed variables (i=1,…,n): Yi=M-variate; xi=P-variate •  Focus: response (Y) distribution = GYx(y/x) ; x-dependence •  Model:

–  Yi generated from latent (underlying) Ui: (Measurement)

–  Focus on distribution, regression re Ui:

(Structural)

•  Overall, hierarchical model:

);( βxuF xU

);,)(, πxuUyF xUY =

∫ == )(),()( , xudFxuUyFxyF xUxUYxY

)( xyG xY

Page 15: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable model

Inflammation Mobility

X1

Xp

Y1

YM

δp

δ1 ε1

εM

ξ η

Measurement Measurement

ζ1

Structural

Page 16: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Well-used latent variable models

Latent variable scale

Observed variable scale

Continuous Discrete

Continuous Factor analysis LISREL

Discrete FA IRT (item response)

Discrete Latent profile Growth mixture

Latent class analysis, regression

General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS) gllamm (Stata)

Page 17: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Why do people use latent variable models?

•  The complexity of my problem demands it •  NIH wants me to be sophisticated •  Reveal underlying truth (e.g. “discover”

latent types) •  Operationalize and test theory •  Sensitivity analyses •  Acknowledge, study issues with

measurement; correct attenuation; etc.

Page 18: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent Variable Models: Philosophy •  Why?

–  To operationalize / test theory –  To learn about measurement errors, differential reporting –  They summarize multiple measures parsimoniously –  To describe population heterogeneity –  Popperian learning

•  Why not? –  Their modeling assumptions may determine scientific conclusions –  Their interpretation may be ambiguous

•  Nature of latent variables? •  Uniqueness (identifiability) •  What if very different models fit comparably? (estimability) •  Seeing is believing

•  Import: They are widely used

Page 19: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Part II: Major elements of

latent variable modeling

Page 20: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

1. Model choice

Page 21: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Example Pro-inflammation in Older Adults

•  Inflammation: central in cellular repair •  Hypothesis: dysregulation=key in accel. aging

–  Muscle wasting (Ferrucci et al., JAGS 50:1947-54; Cappola et al, J Clin Endocrinol Metab 88:2019-25)

–  Receptor inhibition: erythropoetin production / anemia (Ershler, JAGS 51:S18-21)

Stimulus (e.g. muscle damage)

IL-1# TNF-α IL-6 CRP

inhibition

up-regulation

# Difficult to measure. IL-1RA = proxy

Page 22: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Example Pro-inflammation in Older Adults

Inflam.

regulation Adverse outcomes

Y1

Yp

Determinants

e1

ep

Theory informs

relations (arrows)

ς λ1

λp

Measurement

Structural

Page 23: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Pro-inflammation in Older Adults InCHIANTI data (Ferrucci et al., JAGS, 48:1618-25) •  LV method: factor analysis model

–  Continuous indicators, latent variables –  Two distinct underlying variables –  Down-regulation IL-1RA path=0 –  (Conditional independence)

Inflammation 2

Down-reg.

IL-6

TNFα

CRP IL-1RA

IL-18

Inflammation 1

Up-reg.

Page 24: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

“LATENT VARIABLES”? Linear structural equations model with latent variables (LISREL):

Yij = outcome (jth measurement per “person” i) xij = covariate vector (corresponds to jth measurement, person i) λy

j = outcome “loading” (relates outcome LV to Y measurement) ηi = latent outcome=random coefficient vector, person i λx

j= covariate "loading" (relates covariate LV to jth x measurement) ξi = latent covariate = random coefficient vector, person i εij = observed response residual δij = observed covariate residual ςi = latent response residual vector (specified distribution)

Yij = λy

jTηi + εij

Xij = λjXTξi + δij

ηi = Bηi + Γξi + ςi

Page 25: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable models Factor Analysis Measurement Model

X=Λxξ+δ

Φ=Var(ξ); Θδ=Var(δ)

111111 ... iippxixi δξλξλ +++=x

221212 ... iippxixi δξλξλ +++=x

imipxMpixMiM δξλξλ +++= ...11x

Page 26: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable models Factor Analysis Measurement Model

X=Λxξ+δ Φ=Var(ξ); Θδ=Var(δ)

•  Assumptions •  Most frequently: (ξ, δ) ~ multivariate normal •  ξ δ •  Constraints on ϕ, Θδ (“theory”)

•  Ex: Θδ diagonal – indicators uncorrelated given LVs i.e. factor model; conditional independence

π

Page 27: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

2. Fitting

Page 28: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Estimation Overview •   Most  common:    Likelihood-­‐based  approaches  

– Primary  challenge:  the  integral    

  •  Approxima=on  (Laplace)  •  Numerical  integra=on  •  Stochas=c  integra=on  

–   Gradient  methods  –   E-­‐M  algorithm  

•   Bayesian  approaches  (MCMC)  

•   Least  squares  or  analogs  

 

FY |X (y | x) = FY |u,x∫ (y | u,x)dFu|x(u | x)

Page 29: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

ML Estimation Factor model

• Likelihood has closed form (MVN)

~

1

~2 ||)2()|( 1

1

ii xxM

M

mexf

−Σʹ′−−−

=ΣΠΠ=θ

Σ=Θ+ΛΘΛ= δξ')( xxxVar

δξ +Λ= xX

Page 30: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Pro-inflammation in Older Adults (Bandeen-Roche et al., Rejuv Res)

•  LV method: factor analysis model

Inflammation 2

Down-reg.

IL-6

TNFα

CRP IL-1RA

IL-18

Inflammation 1

Up-reg.

.36

. 59 . 45 . 31

. 31

-.59

-.40

.20

Page 31: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

3. Evaluating fit

Page 32: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Methods Global measures

•  Goodness of fit testing

–  Hypothesis: H0: GY|X(y|x) = FY|X(y|x;π,β) for some (π,β) ε Θ

–  Method: Deviance goodness of fit testing, analogs –  Usual issues for quality of asymptotic distribution approximation –  Inflammatory analysis: Deviance goodness of fit pvalue > 0.5

•  Global fit indices: “Hundreds” of them

Page 33: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Methods Residual checking

•  Per-item: Observed – expected

•  Residuals with respect to association structure –  Continuous Y: Covariance or correlation matrix residuals S-

–  Categorical Y: •  Odds ratio matrix residuals: Q- Implied, Q has elements [ad/bc]ij from

cross-tabulation of items i & j •  O-E cell counts for the full cross-tabulation of items (I1xI2x…xIM cells, where

Ij denotes the number of categories for item j)

•  All cases: normalized residuals most useful

Σ

ψ

Page 34: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Example Residual checking

•  NFΚB-gated systemic inflammation

Page 35: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Methods Other

•  Posterior predictive checking Gelman et al., Statistica Sinica, 1996

•  Pseudo-value analysis: More to come

Page 36: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

4. Prediction

Page 37: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring Overview

•  Task: Estimate persons’ underlying status –  “fill in” values for the Ui

•  Fundamental tool: Posterior distribution

Page 38: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring Posterior mean estimation

•  Posterior mean = most common method – Typically: Empirical Bayes (filling in estimates

for parameters) – Minimizes expected posterior quadratic loss

•  Linear case (LISREL): Yields Best Unbiased Linear Predictor (BLUP)

Page 39: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring LISREL (factor) measurement model

•  Posterior mean is closed form linear

,

•  “Regression method”

xTX

1ˆˆˆˆ −ΣΛΦ=ξ Σ= Var(X)

Page 40: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring LISREL (factor) measurement model

•  Alternative method: “Bartlett” scores – Paradigm: treat ξi as fixed parameters per i;

estimate these via weighted least squares

δi ~ N(0, )

•  Which is better? – Depends on analytic purpose

,ˆiii δξ +Λ= XX Σ̂

Page 41: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring Frequent purpose: “multi-stage” regression

•  Step 1: Fit full latent variable measurement model(s) (Y,X) ,

•  Step 2: Obtain predictions Oi given Yi,

and/or Xi, •  Step 3: Obtain via regression of Oi on Xi

or Yi on Oi, as case may be

ΛY ΛX

ΛYΛX

B

Page 42: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Latent variable scoring Frequent purpose: “multi-stage” regression

•  Result: In the fully linear model, provided that estimators in Step 1 are consistent: –  (a) When the covariate is being predicted, employing

the regression method in Steps 2-3 consistently estimates B

–  (b) When the outcome is being predicted, employing the least squares method in Steps 2-3 consistently estimates B

•  Brief rationale for (a): method is analogous to regression calibration with replicates Carroll & Stefanski, JASA, 1990

Page 43: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

5. Identifiability

Page 44: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

One last issue Identifiability

•  Models can be too big / complex •  A model is non-identifiable if distinct

parameterizations lead to identical data distributions –  i.e. analysis not grounded in data

•  Weak identifiability is common too: – Analysis only indirectly grounded in data (via

the model)

Page 45: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Identifiability

data (ground)

model

analysis

strong

Page 46: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Identifiability

data (ground)

model

analysis

weak

Page 47: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Identifiability

data (ground)

model

analysis

non

Page 48: An Introduction to Latent Variable Modeling...4. Psychology. Present and accessible in the unconscious mind, but not consciously expressed. The American Heritage Dictionary of English

Objectives •  What is a latent variable (LV)? •  What are some common LV models?

•  What are major features of LV modeling? –  Hierarchical: structural and measurement components –  Fitting –  Evaluating fit –  Predictions –  Identifiability

•  Why should I consider using—or decide against using—LV models?