39
A measure to evaluate latent variable model fit by sensitivity analysis Daniel Oberski Department of methodology and statistics Dept of Statistics, Leiden University Latent variable model fit by sensitivity analysis Daniel Oberski

A measure to evaluate latent variable model fit by sensitivity analysis

Embed Size (px)

Citation preview

Page 1: A measure to evaluate latent variable model fit by sensitivity analysis

A measure to evaluate latent variablemodel fit by sensitivity analysis

Daniel Oberski

Department of methodology and statistics

Dept of Statistics, Leiden University

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 2: A measure to evaluate latent variable model fit by sensitivity analysis

Latent variable modelsWhat do they assume and what are they good for?

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 3: A measure to evaluate latent variable model fit by sensitivity analysis

ξ

y1 y2 yJ...

p(y) =∑ξ

p(ξ)J∏

j=1

p(yj|ξ)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 4: A measure to evaluate latent variable model fit by sensitivity analysis

ξ

y1 y2 yJ...

p(y) =∑ξ

p(ξ)p(y1, y2|ξ)J∏

j=3

p(yj|ξ)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 5: A measure to evaluate latent variable model fit by sensitivity analysis

Example

Goal: estimate false positives and false negatives in fourdiagnostic tests for C. Trachomatis infection:

y1 Ligase chain reaction (LCR) test (Yes/No);y2 Polymerase chain reaction (PCR) test (Yes/No);y3 DNA probe test (DNAP) (Yes/No);y4 Culture (CULT) (Yes/No).

Tool: 2-latent class model (diseased or non-diseased).

(Original data from Dendukuri et al. 2009)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 6: A measure to evaluate latent variable model fit by sensitivity analysis

Assume:ξ

y1 y2 yJ...

But really:ξ

y1 y2 yJ...

What difference does it make for the goal: false positives andfalse negatives? (simulation by Van Smeden et al., submitted)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 7: A measure to evaluate latent variable model fit by sensitivity analysis

ξ

y1 y2 yJ...

x

p(y) =∑ξ

p(ξ|x)J∏

j=1

p(yj|ξ)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 8: A measure to evaluate latent variable model fit by sensitivity analysis

ξ

y1 y2 yJ...

x

p(y) =∑ξ

p(ξ|x)J∏

j=1

p(yj|ξ, x)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 9: A measure to evaluate latent variable model fit by sensitivity analysis

ExampleGoal: Estimate gender differences in ”valuing Stimulation”:

(1) Very much like me; (2) Like me; (3) Somewhat like me; (4) Alittle like me; (5) Not like me; (6) Not like me at all.

impdiff (S)he looks for adventures and likes to take risks.(S)he wants to have an exciting life.

impadv (S)he likes surprises and is always looking for newthings to do. He thinks it is important to do lots ofdifferent things in life.

Tool: Structural Equation Model for European Social Survey data(n = 18519 men and 16740 women).(Original study by Schwarz et al. 2005)

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 10: A measure to evaluate latent variable model fit by sensitivity analysis

Assume:ξ

y1 y2 yJ...

x

But really (?):ξ

y1 y2 yJ...

x

What difference does it make for the goal: true genderdifferences in values? (re-analysis of data by Oberski 2014)

●●

Men value moreWomen value more

−0.2

0.0

0.2

ACPO ST SD HE COTR SE UN BE"Human value" factor

Late

nt m

ean

diffe

renc

e es

timat

e ±

2 s.

e.

Model

● Scalar invariance

Free intercept 'Adventure'

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 11: A measure to evaluate latent variable model fit by sensitivity analysis

PROBLEM

The original authors found that the conditional independencemodel fit the data ”approximately” (p. 1013)...

”Chi-square deteriorated significantly, ∆χ2(19) = 3313,p < .001, but CFI did not change. Change in chi-square ishighly sensitive with large sample sizes and complex models.The other indices suggested that scalar invariance might beaccepted (CFI = .88, RMSEA = .04, CI = .039.040, PCLOSE= 1.0).”

... but unfortunately this ”acceptable” misspecification couldreverse their conclusions!

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 12: A measure to evaluate latent variable model fit by sensitivity analysis

Numbers that indicate how well the model fits the data• Likelihood Ratio vs. saturated• Information-based criteria: AIC, BIC, CAIC, ...• Bivariate residuals (Maydeu & Joe 2005; Oberski, Van Kollenburg &

Vermunt 2013)

• Score/Lagrange multiplier tests, “modification index”,“expected parameter change” (EPC) (Saris, Satorra & Sörbom1989; Oberski & Vermunt 2013; Oberski & Vermunt accepted)

“Fit indices”:• RMSEA:

√(χ2/df)−1)

N−1

• CFI:[(χ2

null − dfnull)− (χ2 − df)]/(χ2

null − dfnull)

• Lots of others: TLI, NFI, NNFI, RFI, IFI, RNI, RMR,SRMR1-3, GFI, AGFI, MFI, ECVI, ...

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 13: A measure to evaluate latent variable model fit by sensitivity analysis

What is the problem?• We do latent variable modeling with a goal in mind.• But the latent variable model might be misspecified.• The appropriate question: ”will that affect my goal?”• The actual question: ”do the data fit the model in the

population” (LR) or ”are the model and the data far apartrelative to model complexity” (RMSEA etc.)

What is the solution?

Evaluate directly what effect possible misspecificationshave on the goal of the analysis.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 14: A measure to evaluate latent variable model fit by sensitivity analysis

How to evaluate directly what effect possible misspecificationshave on the goal of the analysis.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 15: A measure to evaluate latent variable model fit by sensitivity analysis

Two ideas to evaluate the effect of misspecifications

1 Try out all possible models with misspecifications, calculatethe estimates of interest under these models and evaluatewhether these are substantively different.Advantage: Does the job.Disadvantage: There may be too many alternative models.Also: are applied researchers really going to do this?

2 Use EPC-interest: expected change in free parametersAdvantage: Does the job without the need to estimate anyalternative models.Disadvantage: Is an approximation (though a reasonableone).

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 16: A measure to evaluate latent variable model fit by sensitivity analysis

EPC-interest applied to Stimulation example

• After fitting the full scalar invariance model,• Effect size estimate of sex difference in Stimulation is +0.214

(s.e. 0.0139).• But EPC-interest of equal ”Adventure” item intercept is

-0.243.• So EPC-interest suggests conclusion can be reversed by

freeing a misspecified scalar invariance restriction• Actual change when freeing this intercept is very close to

EPC-interest: -0.235.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 17: A measure to evaluate latent variable model fit by sensitivity analysis

EPC-interestHow does it work?

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 18: A measure to evaluate latent variable model fit by sensitivity analysis

• Let’s say there is a restricted model whose purpose it is toestimate its parameters, θ, or some linear function of themsuch as a subselection, Pθ.

• We could parameterize these restrictions as ψ = 0.For example: ψ could be direct effect of gender on”Adventure”, or loglinear dependence between DNA tests.

• The maximum likelihood estimates are thenθ = arg max L(θ,ψ = 0)Question: How much would θ change if we freed ψ?

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 19: A measure to evaluate latent variable model fit by sensitivity analysis

How much would θ change if we freed ψ?

The trick is to consider estimate of θ we would get under ψ = 0;that is, θ = arg max L(θ,ψ).

As it turns out, we don’t actually need θ, since

θ − θ = H−1θθ HθψD−1

[∂L(θ,ψ)

∂ψ

∣∣∣∣θ=θ

]+O(δ′δ),

where H is a Hessian, D = Hψψ − H′θψH−1

θθ Hθψ and δ is the”overall wrongness” of the model (ψ′,θ′ − θ′)′.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 20: A measure to evaluate latent variable model fit by sensitivity analysis

How much would θ change if we freed ψ?

Dropping the approximation term (assuming the modelparameters are not ”too far” from the truth) we get theapproximation

EPC-interest = −PH−1θθ Hθψ EPC-self ≈ −PH−1

θθ Hθψ

(ψ − ψ

)

For those of you familiar with Structural Equation Modeling (orattending my 2013 MBC2 talk), ”EPC-self” is the usual ”expectedparameter change” in the fixed parameter vector, i.e. the size ofthe misspecification.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 21: A measure to evaluate latent variable model fit by sensitivity analysis

Monte Carlo simulation: EPC-interest is a goodapproximation to the actual change in parameters ofinterest when freeing equality restriction

Average over 200 replications∆ν1 ng EPC-self ∆α ∆α bias EPC-interest EPC-interest bias0.1 50 0.064 0.240 -0.040 -0.034 0.0050.3 50 0.213 0.313 -0.113 -0.113 -0.0010.8 50 0.657 0.505 -0.305 -0.401 -0.0960.1 100 0.058 0.231 -0.031 -0.031 0.0000.3 100 0.203 0.323 -0.123 -0.109 0.0140.8 100 0.619 0.492 -0.292 -0.370 -0.0770.1 500 0.063 0.233 -0.033 -0.033 0.0000.3 500 0.208 0.307 -0.107 -0.112 -0.0050.8 500 0.598 0.501 -0.301 -0.349 -0.048

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 22: A measure to evaluate latent variable model fit by sensitivity analysis

Another example showcasing EPC-interest

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 23: A measure to evaluate latent variable model fit by sensitivity analysis

Ranking data in 48 WVS countries

Option # M/P Value wordingSet A

1. M A high level of economic growth2. M Making sure this country has strong defense forces3. P Seeing that people have more say about how things are done at

their jobs and in their communities4. P Trying to make our cities and countryside more beautiful

Set B1. M Maintaining order in the nation2. P Giving people more say in important government decisions3. M Fighting rising prices4. P Protecting freedom of speech

Set C1. M A stable economy2. P Progress toward a less impersonal and more humane society3. P Progress toward a society in which ideas count more than money4. M The fight against crime

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 24: A measure to evaluate latent variable model fit by sensitivity analysis

Figure: Graphical representation of the multilevel latent class regressionmodel for (post)materialism measured by three partial ranking tasks.Observed variables are shown in rectangles while unobserved (“latent”)variables are shown in ellipses.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 25: A measure to evaluate latent variable model fit by sensitivity analysis

Latent class ranking model with 4 choices

Each ranking set, for example, set A:

P(A1ic = a1,A2ic = a2|Xic = x) = ωa1x∑k ωkx

ωa2x∑k=a1 ωkx

,

where ωkx is the “utility” of object k for respondents in class x.Multilevel structure to account for the countries using group classvariable G:

P(Xic = x|Z1ic = z1ic,Z2ic = z2,Gc = g) =

=exp(αx + γ1xz1 + γ2xz2 + βgx)∑t exp(αt + γ1tz1 + γ2tz2 ++βtg)

,

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 26: A measure to evaluate latent variable model fit by sensitivity analysis

Multilevel latent class model w/ covariates for rankings

L(θ) = P(A1,A2,B1,B2,C1,C2|Z1,Z2) =C∏

c=1

∑G

P(Gc)nc∏i=1

∑X

P(Xic|Z1ic,Z2ic,Gc)×

P(A1ic,A2ic|Xic)P(B1ic,B2ic|Xic)P(C1ic,C2ic|Xic),

Goal: estimate γ (especially its sign).Possible problem: Violations of scalar and metricmeasurement invariance (DIF), parameterized respectively asτ∗ and λ∗.Solution: See if these matter for the sign of γ.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 27: A measure to evaluate latent variable model fit by sensitivity analysis

Table: Full invariance multilevel latent class model: parameter estimatesof interest with standard errors (columns 3 and 4), as well as expectedchange in these parameters measured by the EPC-interest whenfreeing each of six sets of possible misspecifications (columns 5–10).

EPC-interest for...τ∗jkg λ∗

jkxgEstimates Ranking task Ranking task

Est. s.e. 1 2 3 1 2 3Class 1 GDP -0.035 (0.007) -0.013 0.021 -0.002 0.073 0.252 0.005Class 2 GDP -0.198 (0.012) -0.018 -0.035 0.015 -0.163 -0.058 0.002

Class 1 Women 0.013 (0.001) -0.006 0.002 0.000 -0.003 0.029 0.002Class 2 Women -0.037 (0.001) 0.007 -0.003 0.002 -0.006 -0.013 0.002

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 28: A measure to evaluate latent variable model fit by sensitivity analysis

Table: Partially invariant multilevel latent class model: parameterestimates of interest with standard errors (columns 3 and 4), as well asexpected change in these parameters measured by the EPC-interestwhen freeing each of four sets of remaining possible misspecifications(columns 5–7 and 10).

EPC-interest for non-invariance of...τ∗kg λ∗

kxgRanking task Ranking task

Est. s.e. 1 2 3 1 2 3Class 1 GDP -0.127 (0.008) -0.015 -0.003 0.002 0.097Class 2 GDP 0.057 (0.011) -0.043 -0.013 0.002 0.161

Class 1 Women 0.008 (0.001) -0.002 0.000 0.002 0.001Class 2 Women 0.020 (0.001) -0.007 -0.001 0.002 0.007

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 29: A measure to evaluate latent variable model fit by sensitivity analysis

Mixed

Postmaterialist

Materialist

Mixed

Postmaterialist

Materialist

% Women in parliament GDP per capita

0.2

0.4

0.6

Minimum Maximum Minimum MaximumCovariate level

Pro

babi

lity

of C

lass

Figure: Estimated probability of choosing each class as a function of thecovariates of interest under the final model.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 30: A measure to evaluate latent variable model fit by sensitivity analysis

ARM

AUS

AZE

BLR

CHLCHNCOL

CYP

DEU

DZA

ECUEGY ESPEST

GHA

IRQ

JOR

JPN

KAZKGZ

KOR

LBN

MAR

MEX

MYSNGA

NLDNZL

PAK

PER

PHL

POLQAT ROU

RUSRWA

SGPSVN

SWE

TTO

TUN

TUR

UKR

URY

USA

UZB

YEM

ZWE

ARM

AUS

AZE

BLR

CHL

CHN

COL

CYP

DEU

DZA

ECU

EGY

ESPESTGHA

IRQJOR

JPN

KAZKGZ

KOR

LBN MAR

MEXMYSNGA NLD

NZL

PAK

PERPHLPOL

QAT

ROU

RUSRWA

SGP

SVN

SWE

TTO

TUN

TUR

UKR

URY

USA

UZB

YEM ZWE

ARM

AUS

AZE

BLR

CHL

CHN

COL

CYP

DEU

DZA

ECU

EGY

ESPEST

GHA

IRQ

JOR

JPN KAZKGZ

KOR

LBN

MAR

MEX

MYSNGA

NLDNZL

PAK

PER

PHL

POL

QAT

ROU

RUS RWASGP

SVN

SWE

TTOTUN

TUR

UKRURY

USA

UZB

YEM

ZWE

Class 1("Materialist")

Class 2 ("Postmaterialist")

Class 3("Mixed")

0.0

0.2

0.4

0.6

0.8

0 20 40 0 20 40 0 20 40% Women in Parliament

Cla

ss p

oste

rior

ARM

AUS

AZE

BLR

CHLCHNCOL

CYP

DEU

DZA

ECUEGY ESPEST

GHA

IRQ

JOR

JPN

KAZKGZ

KOR

LBN

MAR

MEX

MYSNGA

NLDNZL

PAK

PER

PHL

POL QATROU

RUSRWA

SGPSVN

SWE

TTO

TUN

TUR

UKR

URY

USA

UZB

YEM

ZWE

ARM

AUS

AZE

BLR

CHL

CHN

COL

CYP

DEU

DZA

ECU

EGY

ESPESTGHA

IRQJOR

JPN

KAZKGZ

KOR

LBNMAR

MEXMYSNGA NLD

NZL

PAK

PERPHL POL

QAT

ROU

RUSRWA

SGP

SVN

SWE

TTO

TUN

TUR

UKR

URY

USA

UZB

YEMZWE

ARM

AUS

AZE

BLR

CHL

CHN

COL

CYP

DEU

DZA

ECU

EGY

ESPEST

GHA

IRQ

JOR

JPNKAZKGZ

KOR

LBN

MAR

MEX

MYSNGA

NLDNZL

PAK

PER

PHL

POL

QAT

ROU

RUSRWA SGP

SVN

SWE

TTOTUN

TUR

UKRURY

USA

UZB

YEM

ZWE

Class 1("Materialist")

Class 2 ("Postmaterialist")

Class 3("Mixed")

0.0

0.2

0.4

0.6

0.8

7 8 9 10 11 7 8 9 10 11 7 8 9 10 11Ln(GDP per capita)

Cla

ss p

oste

rior

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 31: A measure to evaluate latent variable model fit by sensitivity analysis

What has been gained by using EPC-interest:

I am fairly confident here that there truly is ”approximatemeasurement invariance”, in the sense that any violations ofmeasurement invariance do not bias the primary conclusions.

I think attaining this goal is the main purpose of model fitevaluation.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 32: A measure to evaluate latent variable model fit by sensitivity analysis

Conclusion

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 33: A measure to evaluate latent variable model fit by sensitivity analysis

Conclusion

• Latent variable modeling is often performed for a purpose;• Model fit evaluation should then be done for the reason that

violations of assumptions can disturb this purpose.

• Introduced the EPC-interest to look into this;• Evaluates the change in the parameter(s) of interest that

would result if a restriction is freed that parameterizes apotential violation of assumptions.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 34: A measure to evaluate latent variable model fit by sensitivity analysis

Implemented in SEM software lavaan for R:

Oberski (2014). Evaluating Sensitivity of Parameters of Interest to MeasurementInvariance in Latent Variable Models. Political Analysis, 22 (1).

Implemented in LCA software Latent Gold:

Oberski, Vermunt & Moors (submitted). Evaluating measurement invariance incategorical data latent variable models with the EPC-interest. Underreview.

Oberski & Vermunt (2014). A model-based approach to goodness-of-fitevaluation in item response theory. Measurement, 11, 117–122.

Nagelkerke, Oberski, & Vermunt (accepted). ”Goodness-of-fit of MultilevelLatent Class Models for Categorical Data”. Sociological Methodology.

Oberski & Vermunt (conditionally accepted). ”The Expected Parameter Change(EPC) for Local Dependence Assessment in Binary Data Latent ClassModels”. Psychometrika.

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 35: A measure to evaluate latent variable model fit by sensitivity analysis

Thank you for your attention!

Daniel [email protected] http://daob.nl/publications for full texts & code

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 36: A measure to evaluate latent variable model fit by sensitivity analysis

SEM regression coefficient example

European Sociological Review 2008, 24(5), 583–599Latent variable model fit by sensitivity analysis Daniel Oberski

Page 37: A measure to evaluate latent variable model fit by sensitivity analysis

SEM regression coefficient example

Conservation Self−transcendence

SwedenDanmark

AustriaSwitzerlandNetherlands

GermanyIrelandSpain

NorwayHungaryFinland

PortugalFrance

BelgiumSlovenia

United KingdomGreece

Czech RepublicPoland

SwedenDanmark

AustriaSwitzerlandNetherlands

GermanyIrelandSpain

NorwayHungaryFinland

PortugalFrance

BelgiumSlovenia

United KingdomGreece

Czech RepublicPoland

ALLO

WN

OC

ON

D

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0Regression coefficient

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 38: A measure to evaluate latent variable model fit by sensitivity analysis

SEM regression coefficient example

EPC-interest statistics of at least 0.1 in absolute value withrespect to the latent variable regression coefficients.

Metric invariance (loading) restriction“Conditions → Work skills” in...

Slovenia France Hungary IrelandEPC-interest w.r.t.:Conditions →

Self-transcendence -0.073 -0.092 -0.067 0.073Conservation 0.144 0.139 0.123 -0.113

SEPC-self 0.610 0.692 0.759 -0.514

Latent variable model fit by sensitivity analysis Daniel Oberski

Page 39: A measure to evaluate latent variable model fit by sensitivity analysis

SEM regression coefficient example

What has been gained by using EPC-interest

• Full metric invariance model: ”close fit”;• EPC-interest still detects threats to cross-country

comparisons of regression coefficients;• MI and EPC-self do not detect these particular

misspecifications;• MI and EPC-self detect other misspecifications;• Looking at EPC-interest reveals that these do not affect the

cross-country comparisons of regression coefficients.

Latent variable model fit by sensitivity analysis Daniel Oberski