Diagrams Representing Relationships between …€¦ · Web viewA variable, i.e., characteristic, presumed to exist, but whose values are NOT observable. A Factor in Factor Analysis

Path Diagrams, EFA, CFA, and Lavaan(You should have read the EFA chapter in T&F and should read the CFA chapter.)Observed Variable

A variable whose values are observable.

Examples: IQ Test scores (Scores are directly observable), GREV, GREQ, GREA, UGPA, Minnesota Job Satisfaction Scale, Affective Commitment Scale, Gender, Questionnaire items.

Latent Variable

A variable, i.e., characteristic, presumed to exist, but whose values are NOT observable. A Factor in Factor Analysis literature. A characteristic of people that is not directly observable.

Intelligence, Depression, Job Satisfaction, Affective Commitment, Tendency to display affective state

No direct observation of values of latent variables is possible. Brain states? Brain chemistry?

Indicator

An observed variable whose values are assumed to be related to the values of a latent variable.

Reflective Indicator

An observed variable whose values are partially determined by, i.e., are influenced by or reflect, the values of a latent variable. For example, responses to Conscientiousness items are assumed to reflect a person’s Conscientiousness.

Formative Indicator

An observed variable whose values partially determine, i.e., cause or form, the values of a latent variable. For example, socioeconomic status is often considered to be a latent variable which is “formed” by 1) family income 2) education, 3) neighborhood etc. Such indicators are not frequently studied in psychology.

Exogenous Variable (Ex = Out)

A variable whose values originate from / are caused by influences outside the model, i.e., are not explained within the theory with which we’re working. That is, a variable whose variation we don’t attempt to explain or predict by whatever theory we’re working with. Causes of exogenous variable originate outside the model. Exogenous variables can be observed or latent.

Endogenous variable (En ~~ In)

A variable whose values are explained within the theory with which we’re working. We account for all variation in the values of endogenous variables using the constructs of whatever theory we’re working with. Causes of endogenous variables originate within the model. Dependent variables are endogenous.

CFA, Amos - 1 Printed on 5/19/2023

Basic EFA, CFA, SEM Path Analytic NotationObserved variables are symbolized by squares or rectangles.

Latent Variables are symbolized by Circles or ellipses.

Correlations or covariances between variables are represented by double-headed arrows.

"Causal" or "Predictive" or “Regression” elationships between variables are represented by single-headed arrows


ObservedVariable

103 84121 76 . . . 97 81

106 78115 80. . . 93 83

101 90128 72 . . . 93 80

103 84121 76 . . . 97 81

ObservedVariable A

"Cor / Cov"Arrow

ObservedVariable B

LatentVariable A

LatentVariable B

"Cor / Cov"Arrow

106 78115 80. . . 93 83

104 79114 79. . . 92 81

"Causal"Arrow

ObservedVariable

LatentVariable

"Causal"Arrow

ObservedVariable

LatentVariable

LatentVariable

"Causal"Arrow

LatentVariable

ObservedVariable

ObservedVariable

"Causal"Arrow

Latent Variable

Values of individuals on latent variables are not observable, hence the dimmed text.

Exogenous Observed Variables

Exogenous variable connect to other variables in the model through either a “causal” arrow or a correlation

Exogenous Latent Variables

Exogenous latent variables also connect to other variables in the model through either a “causal” arrow or a correlation

Endogenous Observed Variables - Endogenous Latent Variable

Endogenous variables connect to other variables in the model by being on the “receiving” end of one or more “causal” arrows. Specifically, endogenous variables are typically represented as being “caused” by 1) other variables in the theory and 2) random error. Thus, 100% of the variation in every endogenous variable is accounted for by either other variables in the model or random error. This means that random error is an exogenous latent variable in SEM diagrams. Random error is a catch-all concept representing all “other” things that are affecting the endogenous variable.

Summary statistics associated with symbols

Our SEM program, Amos, prints means and variances above and to the right. Typically the mean and variance of latent variables are fixed at 0 and 1 respectively, although there are exceptions to this in advanced applications.


"Causal"Arrow

ObservedVariable

ObservedVariable

"Causal"Arrow

LatentVariable Latent

Variable

ObservedVariable

"Causal"Arrow "Causal"

ArrowLatent

Variable

Randomerror

"Correlation"Arrow

"Correlation"Arrow

ObservedVariable

Mean, Variance

LatentVariable

Mean, Variance

B or

"Causal"Arrowr or Covariance

"Correlation"Arrow

Randomerror

Path Diagrams of Analyses We’ve Done Previously

Following is how some of the analyses we’ve performed previously would be represented using path diagrams.

1. Simple correlation between two observed variables.

2. Simple correlations between three observed variables.

3. Simple regression of an observed dependent variable onto one observed independent variable.

4. Multiple Regression of an observed dependent variable onto three observed independent variables.


B or eP511GGRE-Q

rVA

rQArVQ

GRE-AGRE-QGRE-V

rVQ

GRE-QGRE-V

GRE-Q P511G eBQ or Q

GRE-V

UPGA

BV or V

BU or U

Note that the endogenous variable is caused in part by catch-all influences that are unobserved.

Note that the endogenous variable, P511G, is caused in part by unobserved catch-all influences.

Path Analyses of ANOVAs

Since ANOVA is simply regression analysis, the representation of ANOVA in SEM is merely as a regression analysis. The key is to represent the differences between groups with group coding variables, just as we did in 513 and in the beginning of 595 . . .

1) Independent Groups t-testThe two groups are represented by a single, dichotomous observed group-coding variable. It is the independent variable in the regression analysis.

2) One Way ANOVAThe K groups are represented by K-1 group-coding variables created using one of the coding schemes (although I recommend contrast coding). They are the independent variables in the regression analysis. If contrast codes are used, the correlations between all the group coding variables are 0, so no arrows between them need be shown.

3) Factorial ANOVA. (Not covered in 2018)Each factor is represented by G-1 group-coding variables created using one of the coding schemes. The interaction(s) is/are represented by products of the group-coding variables representing the factors. Again, no correlations between coding variables need be shown if contrast codes are used.


eDependentVariable

Dichotomous variable representing the two groups

eDependentVariable

. . . . .(K-1)th Group-coding variable.

2nd Group-coding variable.

1st Group-coding variable

Note: If Contrast codes were used, Group-coding variables would be uncorrelated and no arrows between IV variables would be required.

eDependentVariable

Interaction

Interaction

Interaction

Interaction

2st Factor

2st Factor

1st Factor

1st Factor Note: Contrast codes should be used to make sure the group-coding variables uncorrelated (assuming equal sample sizes.)

Path Diagrams representing Exploratory Factor Analysis1) Exploratory Factor Analysis solution with one factor.

The factor is represented by a latent variable with three or more observed indicators. (Three is the generally recommended minimum no. of indicators for a factor.)

Note that factors are exogenous. Indicators are endogenous.

It’s a rule that all of the variance of endogenous variables must be accounted for in the diagram.

Thus, each indicator must have an error or sometimes called residual latent variable to account for the variance in it not accounted for by the factor.


e3

Obs 1

Obs 2

Obs 3

e1

e2F

2) Exploratory Factor Analysis model with two orthogonal factors.

For exploratory factor analysis, each variable is required to load on all factors. Of course, the hope is that the loadings will be substantial on only some of the factors and will be close to 0 on the others, but the loadings on all factors are estimated, even if they’re close to 0.

Primary vs Secondary Loadings

The loadings of items are conceptualized in two groups. First, are the primary loadings - those connecting a factor to the indicators we believe it should be connected to.

The second group is secondary or cross loadings. These are the loadings that connect a factor to all the “other” variables – the variables that should not be connected to the factor even though they’re required to be estimated.

Let’s assume that Obs 1, 2, and 3 are thought to be primary indicators of F1 and 4,5,6 are thought to be the primary indicators of F2.

In the figure below, the primary loadings of each factors are in black with the cross loadings in red.

I should point out, however, that there is no mathematical difference between a primary loading and a secondary loading. They’re all treated identically in the estimation process.

The lack of an arrow between the two factors means that they will be estimated as being orthogonal, i.e., independent.

Orthogonal factors represent uncorrelated aspects of behavior.

Note what is assumed in the above figure: There are two independent characteristics of people – F1 and F2. Each one influences responses to all six items, although it is hoped that F1 influences primarily the first 3 items and that F2 influences primarily the last 3 items.

Theoretical importance of separation of loadings

If Obs 1 thru Obs 3 are one class of behavior and Obs 4 thru Obs 6 are a second class, then if the loadings “fit” the expected pattern – loadings of 1,2, and 3 on F1 are large, while those of 4, 5, and6 on F2 are large, with all cross loadings small, this would be evidence for the existence of two independent dispositions – that represented by F1 and that represented by F2.

Practical importance of separation of loadings

We can use the mean of Obs1, Obs2, and Obs3 to estimate F1. We can use the mean of Obs4, Obs5, and Obs6 to estimate F2.


F2

F1

e6

e5

e4

e3

e2

e1

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

3) Exploratory Factor Analysis solution with two oblique factors.

Each factor is represented by a latent variable with three or more indicators. The obliqueness of the factors is represented by the fact that there IS an arrow connecting the factors.

Again, in exploratory factor analysis, all indicators load on all factors, even if the loadings are close to zero.

This solution is potentially as important as the orthogonal solution, although in general, I think that researchers are more interested in independent dispositions than they are in correlated dispositions.

But discovering why two dispositions are separate but still correlated is an important and potentially rewarding task.


F2

F1

e9

e8

e7

e6

e5

e4

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

Diagram of EFA model of NEO-FFI Big Five 60 item questionnaire.

(From Biderman, M. (2014). Against all odds: Bifactors in EFAs of Big Five Data. Part of symposium: S. McAbee & M. Biderman, Chairs. Theoretical and Practical Advances in Latent Variable Models of Personality. Conducted at the 29th annual conference of The Society for Industrial and Organizational Psychology; Honolulu, Hawaii, 2014.

Crossloadings are in red.


Cross loadings are red’d.

It took me about 2 hours to draw this figure.

A major problem with EFA – the existence of so many cross loadings!!!

Path Diagrams vs the Table of Loadings.(Cross loadings are in red in both representations.)

Whew – there are tons of cross loadings, most but not all of them near 0. One the one hand, many researchers believe they can be ignored and set equal to zero. This kind of thinking leads to Confirmatory Factor Analysis Models, discussed below.

Others feel that even though they’re small, there are so many of them that their combined effect cannot be ignored.


Pattern Matrixa

Factor1 2 3 4 5

ne1 -.025 -.001 -.056 -.067 .678ne2 .086 .002 .021 -.026 .268ne3 .126 -.023 .133 .229 .260ne4 .054 .014 .112 .184 .492ne5 -.115 -.045 -.069 -.269 .624ne6 .042 .206 -.200 .085 .563ne7 .194 -.168 .046 -.203 .166ne8 .342 -.127 .087 .299 .432ne9 .270 .046 .092 .419 .223ne10 -.017 -.138 -.105 .135 .327ne11 .209 -.270 .009 .102 .278ne12 .140 -.170 -.103 -.070 .483na1 -.100 -.193 .081 .518 .249na2 .347 -.153 -.076 .421 -.139na3 .091 -.123 -.050 .560 .125na4 -.139 .034 -.011 .504 -.099na5 .298 .073 .033 .335 .188na6 .335 .157 -.070 .353 .031na7 .019 -.177 .067 .231 .346na8 .053 .029 -.114 .543 .292na9 .163 .115 -.120 .319 -.211na10 -.057 -.123 .167 .594 .271na11 .028 .012 .055 .471 -.227na12 .027 -.210 -.075 .534 -.022nc1 .092 -.429 -.090 .183 .008nc2 .019 -.580 -.049 .055 -.086nc3 -.030 -.376 -.037 .011 -.140nc4 -.093 -.406 .052 .156 .028nc5 .026 -.716 -.025 -.156 .057nc6 .146 -.476 .100 .241 -.052nc7 -.154 -.694 -.070 -.121 .109nc8 .092 -.528 .019 -.017 .110nc9 .040 -.573 .044 -.050 .094nc10 .021 -.720 .016 -.103 .044nc11 .035 -.551 -.067 .148 .005nc12 .065 -.628 .035 -.011 .018ns1 .501 .072 .145 -.231 -.031ns2 .544 -.119 .132 -.044 .027ns3 .653 .037 -.026 -.069 -.118ns4 .664 .025 -.141 .006 .189ns5 .660 .082 -.031 .200 .047ns6 .677 -.053 -.104 .011 .046ns7 .658 .035 .027 -.069 .011ns8 .552 .068 -.078 .265 .012ns9 .724 -.093 .134 -.021 .027ns10 .662 -.041 -.068 -.009 .220ns11 .563 -.266 .035 .030 -.132ns12 .611 -.082 -.177 .075 -.003no1 -.146 .334 .267 -.003 -.103no2 .030 .366 .136 .061 .087no3 -.014 .044 .661 -.040 .046no4 .116 .037 .142 -.119 -.008no5 -.010 -.090 .731 .102 -.166no6 .029 .168 .217 .124 .207no7 -.064 .025 .207 .082 -.082no8 .024 .233 .240 -.101 -.197no9 -.008 -.012 .822 .049 -.123no10 .052 .135 .536 .026 -.033no11 -.026 -.111 .560 -.041 .110no12 .021 .131 .615 -.174 .090Extraction Method: Maximum Likelihood. Rotation Method: Oblimin with Kaiser Normalization.a. Rotation converged in 14 iterations.

Confirmatory vs Exploratory Factor AnalysisIn Exploratory Factor Analysis the loading of every item on every factor is estimated. We assume that every characteristic (factor) has at least a little influence on the response to every item. Of course, we may hope that just a few of those loadings will be large and all the others will be small. An EFA two-orthogonal-factor model is represented by the following diagram.

As mentioned above there are arrows (loadings) connecting each variable to each factor. We have no hypotheses about the loading values – we’re exploring – so we estimate all loadings and let them lead us.

Generally, EFA programs do not allow you to specify or fix loadings to pre-determined values. That’s a problem that needs to be fixed in EFA software. (I want us to have our cake and eat it too.)

In contrast to the exploration implicit in EFA, a factor analysis in which some loadings are fixed at specific values is called a Confirmatory Factor Analysis. The analysis is confirming one or more hypotheses about loadings, hypotheses representing by our fixing them at specific (usually 0) values.

Below is shown the path diagram of a confirmatory analysis model – one in which the primary loadings are represented but the secondary, aka, cross loadings are set to 0. In the diagram, a loading or correlation that has been set to 0 is simply left out of the diagram.

Hmm. The path diagram is certainly simpler than that of an EFA.

It turns out that the mathematics of CFA is simpler also, resulting in CFAs being used for most relations studies – structural equation models.


F2

F1

e6

e5

e4

e3

e2

e1

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

EFA of two orthogonal factors influencing 6 items

F2e5

e6Obs 6

Obs 5

e3

e2F1

Obs 4

Obs 2

Obs 1

e4

e1

Obs 3

CFA of two orthogonal factors influencing 6 items

EFA vs. CFA of Big Five scale testlets Start 11/6

The data are from Wrensen & Biderman (2005).

The 50 Big Five items were used to compute 3 testlets (parcels) per dimension.

More on why testlets were used later.

A testlet is a mini-scale – the mean of 2-5 items. Here, each was the mean of 3 items. More on testlets in a few pages.

Data were collected in two conditions – honest response and fake good. The honest response data will be analyzed here.


First, an Exploratory Factor Analysis using SPSS

Analyze -> Data Reduction -> Factor ...


I asked SPSS to print the correlation matrix. It’s small enough to display on one page.

If you’re analyzing items, as opposed to testlets, the correlation matrix may be too large to display usefully.

Note that if you have a strong theory and other evidence that there are only a specific number of factors, you can tell SPSS to extract only that number.

You might be thinking, “Why does extracting the appropriate number of factors matter?”

The issue is that the WHOLE correlation matrix is reproduced by the number of factors you extract. If you extract too few, you’ll run the risk of not getting a good fit.

If you extract too many, you may run the risk of having ridiculous factors that don’t have any general meaning – reflective wording idiosyncracies or phrasing similarities between items which are not of general interest.


I requested an oblique (correlated factors) solution, to see how much the factors are correlated.

Choose Maximum Likelihood (ML) as the extraction method.

ML, I believe, has become the preferred extraction method for EFA. Alas, sometimes it fails to converge.

If ML fails, choose Principal Axes (PA2).

The correlation matrix of indicators, asked for in the Descriptives dialog box above.- - - - - - - - - - - - - - - - - - - - - - - - F A C T O R A N A L Y S I S - - - - - - - - - Factor Analysis[DataSet1] G:\MdbR\Wrensen\WrensenDataFiles\WrensenMVsImputed070114.sav

Correlation Matrix

hetl1 hetl2 hetl3 hatl1 hatl2 hatl3 hctl1 hctl2 hctl3 hstl1 hstl2 hstl3 hotl1 hotl2 hotl3

Correlat

ion

hetl1 1.000 .729 .313 .020 .066 -.057 .002 -.039 -.067 -.085 .001 .045 .139 .127 .153

hetl2 .729 1.000 .306 .049 .165 .036 .000 -.027 -.007 -.132 .022 .015 .152 .163 .244

hetl3 .313 .306 1.000 .069 -.061 .044 -.080 -.067 .045 .078 .169 .150 .117 .061 .127

hatl1 .020 .049 .069 1.000 .463 .120 .137 .090 .150 .027 -.063 -.023 .096 .037 .202

hatl2 .066 .165 -.061 .463 1.000 .235 .168 .064 .100 -.098 -.053 -.103 .055 -.003 .220

hatl3 -.057 .036 .044 .120 .235 1.000 .041 -.013 -.035 -.119 -.050 -.031 -.113 -.056 .066

hctl1 .002 .000 -.080 .137 .168 .041 1.000 .734 .610 .163 .155 .136 .080 .071 .099

hctl2 -.039 -.027 -.067 .090 .064 -.013 .734 1.000 .622 .099 .075 .056 -.077 -.031 .082

hctl3 -.067 -.007 .045 .150 .100 -.035 .610 .622 1.000 .313 .272 .177 .172 .178 .303

hstl1 -.085 -.132 .078 .027 -.098 -.119 .163 .099 .313 1.000 .688 .762 .263 .271 .190

hstl2 .001 .022 .169 -.063 -.053 -.050 .155 .075 .272 .688 1.000 .649 .248 .275 .070

hstl3 .045 .015 .150 -.023 -.103 -.031 .136 .056 .177 .762 .649 1.000 .174 .169 .111

hotl1 .139 .152 .117 .096 .055 -.113 .080 -.077 .172 .263 .248 .174 1.000 .648 .499

hotl2 .127 .163 .061 .037 -.003 -.056 .071 -.031 .178 .271 .275 .169 .648 1.000 .466

hotl3 .153 .244 .127 .202 .220 .066 .099 .082 .303 .190 .070 .111 .499 .466 1.000


Note the clusters.

Communalitiesa

Initial Extraction

hetl1 .562 .655

hetl2 .588 .824

hetl3 .199 .161

hatl1 .263 .282

hatl2 .335 .792

hatl3 .125 .086

hctl1 .619 .677

hctl2 .628 .835

hctl3 .558 .588

hstl1 .711 .841

hstl2 .572 .586

hstl3 .641 .737

hotl1 .512 .684

hotl2 .480 .608

hotl3 .436 .437

Extraction Method: Maximum

Likelihood.

Total Variance Explained

Factor

Initial Eigenvalues Extraction Sums of Squared Loadings

Rotation Sums

of Squared

Loadingsa

Total % of Variance Cumulative % Total % of Variance Cumulative % Total

1 3.307 22.047 22.047 2.861 19.071 19.071 2.461

2 2.284 15.226 37.273 1.830 12.199 31.270 2.208

3 2.157 14.378 51.652 1.994 13.291 44.560 1.741

4 1.495 9.967 61.618 1.111 7.406 51.967 1.294

5 1.320 8.801 70.419 1.000 6.664 58.631 2.115

6 .895 5.969 76.388

7 .815 5.433 81.822

8 .564 3.763 85.585

9 .506 3.375 88.960

10 .392 2.616 91.576

11 .333 2.219 93.794

12 .277 1.845 95.640

13 .249 1.661 97.301

14 .221 1.472 98.773


If we were in the process of developing this questionnaire, we would look at the items making up hetl3 and hatl3. It appears that there are items in those testlets that don’t really belong with the other e and a items.

I request all factors with eigenvalues >= 1. It got 5.

15 .184 1.227 100.000

Extraction Method: Maximum Likelihood.

The scree plot is not terribly informative, so, as is the case in many instances, it must be taken with a grain of salt.


Goodness-of-fit Test

Chi-Square df Sig.

51.615 40 .103

Pattern Matrixa

Factor

1 2 3 4 5

hetl1 -.067 .027 .813 -.008 .002

hetl2 -.105 .045 .895 .087 .038

hetl3 .122 -.048 .360 -.050 .016

hatl1 .030 .051 -.034 .505 .107

hatl2 .007 -.001 .033 .885 .055

hatl3 -.017 -.029 .004 .282 -.082

hctl1 .021 .811 .012 .053 -.010

hctl2 -.067 .943 .023 -.075 -.139

hctl3 .119 .688 -.046 .009 .153

hstl1 .891 .029 -.127 .003 .066

hstl2 .743 .033 .030 .003 .040

hstl3 .880 -.007 .074 .007 -.100

hotl1 .013 -.065 -.013 -.033 .834

hotl2 .019 -.021 .009 -.096 .773

hotl3 -.015 .083 .094 .158 .590


Rotation Method: Oblimin with Kaiser Normalization.

a. Rotation converged in 5 iterations.

Factor Correlation Matrix

Factor 1 2 3 4 5

1 1.000 .179 .074 -.158 .314

2 .179 1.000 -.061 .157 .155

3 .074 -.061 1.000 .030 .218

4 -.158 .157 .030 1.000 .054

5 .314 .155 .218 .054 1.000


Rotation Method: Oblimin with Kaiser Normalization.


S C E A OMean of factor correlations is .096.

If the factors were truly orthogonal and there were no other effects operating, the mean of factor correlations would be closer to 0. This is an issue we’ll return to.

The S~O correlation of .314 is one that some people would kill for in psychology. Here, it’s a problem correlation.

Path Diagram

E

A

C

S

O

E1 E2 E3

A1 A2 A3

C1 C2 C3

S1 S2 S3

O1 O2 O3

S

CE

A

O

More on goodness-of-fit later. I will note that this is one of the few nonsignificant chi-squares I’ve seen.

S C E A O

Now a Confirmatory Factor Analysis using Rcmdr’s CFA capabilities

R Rcmdr Import data from SPSS data set WrensenMVsImputed070114.sav

library(foreign, pos=14)

> Wrensen1 <- + read.spss("G:/MDBR/Wrensen/WrensenDataFiles/WrensenMVsImputed070114.sav", + use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)

> colnames(Wrensen1) <- tolower(colnames(Wrensen1))

> library(sem, pos=15)

Statistics Dimensional Analysis Confirmatory Factor Analysis . . .

After defining all the factors, press the [OK] button.

Rcmdr CFA Results for Wrensen Testlet data

> local({+ .model <- c('HA: hatl1, hatl2, hatl3', 'HC: hctl1, hctl2, hctl3', + 'HE: hetl1, hetl2, hetl3', 'HO: hotl1, hotl2, hotl3', + 'HS: hstl1, hstl2, hstl3')+ .model <- cfa(file=textConnection(.model), reference.indicators=FALSE)+ .Data <- Wrensen1[, c('hatl1', 'hatl2', 'hatl3', 'hctl1', 'hctl2', + 'hctl3', 'hetl1', 'hetl2', 'hetl3', 'hotl1', 'hotl2', 'hotl3', 'hstl1', + 'hstl2', 'hstl3')]+ summary(sem(.model, data=.Data), robust=FALSE, fit.indices=c("AIC","BIC"))+ })


1. CTRL-click on all indicators of a factor

2. Enter the name of the factor

3. Click on the [Define Factor] button.

Repeat for each factor.

Model Chisquare = 131.4027 Df = 80 Pr(>Chisq) = 0.0002588754 AIC = 211.4027 BIC = -277.5563

Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -1.832000 -0.295100 0.000005 0.123100 0.481300 3.340000

R-square for Endogenous Variables (Communalities) hatl1 hatl2 hatl3 hctl1 hctl2 hctl3 hetl1 hetl2 hetl3 hotl1 hotl2 0.2572 0.8363 0.0647 0.7404 0.7166 0.5302 0.6323 0.8418 0.1191 0.6714 0.6120 hotl3 hstl1 hstl2 hstl3 0.3751 0.8460 0.5756 0.6852

Parameter Estimates These are Unstandardized estimatesLoadings Estimate Std Error z value Pr(>|z|) lam[hatl1:HA] 0.37757991 0.08599843 4.3905440 1.130674e-05 hatl1 <--- HA lam[hatl2:HA] 0.68773190 0.12914835 5.3251311 1.008803e-07 hatl2 <--- HA lam[hatl3:HA] 0.09811539 0.03525781 2.7827984 5.389229e-03 hatl3 <--- HA lam[hctl1:HC] 0.64310673 0.05105659 12.5959595 2.222410e-36 hctl1 <--- HC lam[hctl2:HC] 0.75834598 0.06149906 12.3310166 6.166379e-35 hctl2 <--- HC lam[hctl3:HC] 0.52342898 0.05125676 10.2119023 1.753919e-24 hctl3 <--- HC lam[hetl1:HE] 0.76304945 0.08337527 9.1519874 5.589561e-20 hetl1 <--- HE lam[hetl2:HE] 0.83073551 0.08114437 10.2377472 1.343277e-24 hetl2 <--- HE lam[hetl3:HE] 0.18848974 0.04443744 4.2416877 2.218452e-05 hetl3 <--- HE lam[hotl1:HO] 0.54162801 0.05005534 10.8205840 2.750194e-27 hotl1 <--- HO lam[hotl2:HO] 0.55030935 0.05349786 10.2865670 8.101345e-25 hotl2 <--- HO lam[hotl3:HO] 0.50066444 0.06369553 7.8602764 3.832868e-15 hotl3 <--- HO lam[hstl1:HS] 0.89190917 0.06224449 14.3291276 1.439172e-46 hstl1 <--- HS lam[hstl2:HS] 0.68736329 0.06257501 10.9846299 4.530920e-28 hstl2 <--- HS lam[hstl3:HS] 0.71024970 0.05757189 12.3367446 5.743132e-35 hstl3 <--- HS Factor correlationsC[HA,HC] 0.15890691 0.09208606 1.7256348 8.441313e-02 HC <--> HA C[HA,HE] 0.15863217 0.09171458 1.7296288 8.369662e-02 HE <--> HA C[HA,HO] 0.09419440 0.09448682 0.9969052 3.188105e-01 HO <--> HA C[HA,HS] -0.11193323 0.08979620 -1.2465253 2.125716e-01 HS <--> HA C[HC,HE] -0.02440816 0.08921141 -0.2735991 7.843927e-01 HE <--> HC C[HC,HO] 0.09523802 0.09230993 1.0317202 3.022032e-01 HO <--> HC C[HC,HS] 0.20944491 0.08477614 2.4705642 1.349001e-02 HS <--> HC C[HE,HO] 0.24149625 0.08844528 2.7304594 6.324613e-03 HO <--> HE C[HE,HS] -0.06947310 0.08724623 -0.7962878 4.258648e-01 HS <--> HE C[HO,HS] 0.33711293 0.08269579 4.0765430 4.571021e-05 HS <--> HO Residual variancesV[hatl1] 0.41171446 0.06784446 6.0685051 1.291063e-09 hatl1 <--> hatl1V[hatl2] 0.09256757 0.16699242 0.5543220 5.793585e-01 hatl2 <--> hatl2V[hatl3] 0.13919108 0.01578289 8.8191141 1.153689e-18 hatl3 <--> hatl3V[hctl1] 0.14503198 0.03224827 4.4973571 6.880338e-06 hctl1 <--> hctl1V[hctl2] 0.22744217 0.04634193 4.9079129 9.205070e-07 hctl2 <--> hctl2V[hctl3] 0.24275534 0.03273110 7.4166569 1.201136e-13 hctl3 <--> hctl3V[hetl1] 0.33854436 0.09322674 3.6314082 2.818790e-04 hetl1 <--> hetl1V[hetl2] 0.12969459 0.10216307 1.2694860 2.042678e-01 hetl2 <--> hetl2V[hetl3] 0.26267035 0.02958724 8.8778260 6.817981e-19 hetl3 <--> hetl3V[hotl1] 0.14355413 0.03355314 4.2784106 1.882326e-05 hotl1 <--> hotl1V[hotl2] 0.19200353 0.03731650 5.1452723 2.671330e-07 hotl2 <--> hotl2V[hotl3] 0.41753079 0.05368607 7.7772644 7.410959e-15 hotl3 <--> hotl3V[hstl1] 0.14479381 0.04603802 3.1450924 1.660344e-03 hstl1 <--> hstl1V[hstl2] 0.34830762 0.04664075 7.4678820 8.149586e-14 hstl2 <--> hstl2V[hstl3] 0.23180800 0.03768383 6.1513911 7.680625e-10 hstl3 <--> hstl3

Use the R Lavaan package to perform CFAs.


Note that the chi-square statistic is much larger than that for the EFA shown above. CFA chi-squares will always be larger than eFA chi-squares.

Diffs between obs and predicted covariances.

Finally a Confirmatory Factor Analysis using R’s lavaan package.Wait!! Why do we need another CFA procedure? Because lavaan is more versatile.

I imported the data to R using Rcmdr’s Data -> Import from SPSS file . . .

> Wrensen1 <- read.spss("G:/MDBR/Wrensen/WrensenDataFiles/WrensenMVsImputed070114.sav", use.value.labels=TRUE, max.value.labels=Inf, + to.data.frame=TRUE)

> colnames(Wrensen1) <- tolower(colnames(Wrensen1))> library(lavaan)> wrensen1.model.lavaan <- 'ha=~hatl1+hatl2+hatl3+ hc=~hctl1+hctl2+hctl3+ he=~hetl1+hetl2+hetl3+ ho=~hotl1+hotl2+hotl3+ hs=~hstl1+hstl2+hstl3'

> wrensen1.fit <- cfa(wrensen1.model.lavaan,data=Wrensen1)

> summary(wrensen1.fit,fit.measures=T,standardized=T)

lavaan (0.5-23.1097) converged normally after 51 iterations

Number of observations 166

Estimator ML Minimum Function Test Statistic 132.199 RCMDR = 131.4027 Degrees of freedom 80 P-value (Chi-square) 0.000

Model test baseline model:

Minimum Function Test Statistic 997.670 Degrees of freedom 105 P-value 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 0.942 Amos = .942 Tucker-Lewis Index (TLI) 0.923

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -2385.845 Loglikelihood unrestricted model (H1) -2319.746

Number of free parameters 40 Akaike (AIC) 4851.690 Bayesian (BIC) 4976.170 Sample-size adjusted Bayesian (BIC) 4849.527

Root Mean Square Error of Approximation:

RMSEA 0.063 Amos = .062 90 Percent Confidence Interval 0.043 0.081 P-value RMSEA <= 0.05 0.135

Standardized Root Mean Square Residual:

SRMR 0.067


Tells R I want to use lavaan

This is lavaan “syntax”, telling it which factors to estimate and what the indicators of the factors are. I’m put the syntax in wrensen1.model.lavaan. Press the [Return] key after defining each factor.

Invokes lavaan’s CFA command telling it that the syntax is in wrensen1.model.lavaan.

RCMDR’s CFA procedure will print many of these values. I just did not ask for them in the example shown on the previous page.

Parameter Estimates:

Information Expected Standard Errors Standard

Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Amos ha =~ hatl1 1.000 0.376 0.507 .51 hatl2 1.821 0.680 2.680 0.007 0.686 0.915 .91 hatl3 0.260 0.092 2.837 0.005 0.098 0.254 .25 hc =~ hctl1 1.000 0.641 0.860 .86 hctl2 1.179 0.107 11.021 0.000 0.756 0.847 .85 hctl3 0.814 0.083 9.860 0.000 0.522 0.728 .73 he =~ hetl1 1.000 0.761 0.795 .80 hetl2 1.089 0.173 6.279 0.000 0.828 0.917 .92 hetl3 0.247 0.059 4.217 0.000 0.188 0.345 .35 ho =~ hotl1 1.000 0.540 0.819 .82 hotl2 1.016 0.125 8.128 0.000 0.549 0.782 .78 hotl3 0.924 0.130 7.098 0.000 0.499 0.612 .61 hs =~ hstl1 1.000 0.889 0.920 .92 hstl2 0.771 0.068 11.328 0.000 0.685 0.759 .76 hstl3 0.796 0.063 12.563 0.000 0.708 0.828 .83

Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Amos ha ~~ hc 0.038 0.026 1.486 0.137 0.159 0.159 .16 he 0.045 0.031 1.471 0.141 0.159 0.159 .16 ho 0.019 0.020 0.945 0.345 0.094 0.094 .09 hs -0.037 0.033 -1.149 0.250 -0.112 -0.112 -.11 hc ~~ he -0.012 0.043 -0.274 0.784 -0.024 -0.024 -.02 ho 0.033 0.032 1.021 0.307 0.095 0.095 .10 hs 0.119 0.051 2.326 0.020 0.209 0.209 .21 he ~~ ho 0.099 0.041 2.432 0.015 0.241 0.241 .24 hs -0.047 0.059 -0.790 0.429 -0.069 -0.069 -.07 ho ~~ hs 0.162 0.047 3.473 0.001 0.337 0.337 .34


These are lavaan’s standardized loadings

These are lavaan’s factor correlatioins

Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .hatl1 0.409 0.067 6.087 0.000 0.409 0.743 .hatl2 0.092 0.165 0.556 0.578 0.092 0.164 .hatl3 0.138 0.016 8.846 0.000 0.138 0.935 .hctl1 0.144 0.032 4.511 0.000 0.144 0.260 .hctl2 0.226 0.046 4.923 0.000 0.226 0.283 .hctl3 0.241 0.032 7.439 0.000 0.241 0.470 .hetl1 0.337 0.092 3.642 0.000 0.337 0.368 .hetl2 0.129 0.101 1.273 0.203 0.129 0.158 .hetl3 0.261 0.029 8.905 0.000 0.261 0.881 .hotl1 0.143 0.033 4.291 0.000 0.143 0.329 .hotl2 0.191 0.037 5.161 0.000 0.191 0.388 .hotl3 0.415 0.053 7.801 0.000 0.415 0.625 .hstl1 0.144 0.046 3.155 0.002 0.144 0.154 .hstl2 0.346 0.046 7.490 0.000 0.346 0.424 .hstl3 0.230 0.037 6.170 0.000 0.230 0.315 ha 0.142 0.064 2.202 0.028 1.000 1.000 hc 0.411 0.065 6.317 0.000 1.000 1.000 he 0.579 0.126 4.590 0.000 1.000 1.000 ho 0.292 0.054 5.427 0.000 1.000 1.000 hs 0.791 0.110 7.186 0.000 1.000 1.000

(These are variances of the residuals – the catch all terms – followed by variances of the factors.

Lavaan uses the identification method that estimates factor variances and residual variance.)


Comparison of EFA and CFA Standardized LoadingsSPSS Amos Lavaan

Fac Item EFA CFA CFA E hetl1 .813 .80 .795E hetl2 .895 .92 .917E hetl3 .360 .35 .345A hatl1 .505 .51 .507A hatl2 .885 .91 .915A hatl3 .282 .25 .254C hctl1 .811 .86 .860C hctl2 .943 .85 .847C hctl3 .688 .73 .728S hstl1 .891 .92 .920S hstl2 .743 .76 .759S hstl3 .880 .83 .828O hotl1 .834 .82 .819O hotl2 .773 .78 .782O hotl3 .590 .61 .612

Clearly the SPSS EFA and Amos/Lavaan CFA loadings are nearly identical.

We would hope that the EFA and CFA loadings would be generally similar.

However, their similarity will depend on the extent to which the questionnaire creator has worked to make sure that each indicator is influenced by one and only one factor.

If there are many large EFA cross-loadings, then the CFA loading estimates may be different from the EFA loadings due to the fact that CFA assumes all cross loadings are equal to 0.

You can specify that selected cross loadings are not zero in a CFA, but you can just do that for a few.


r = .983

EFA loadings are typically standardized.

The Amos and Lavaan loadings shown here are the standardized loadings.

Comparison of Amos and Rcmdr and Lavaan Unstandardized Loadings

Amos Rcmdr LavaanFac Item CFA CFA CFAE hetl1 .76 .76 1.00E hetl2 .83 .83 1.09E hetl3 .19 .19 .25A hatl1 .38 .38 1.00A hatl2 .69 .69 1.82A hatl3 .10 .10 .26C hctl1 .64 .64 1.00C hctl2 .76 .76 1.18C hctl3 .52 .52 .81S hstl1 .89 .89 1.00S hstl2 .69 .69 .77S hstl3 .71 .71 .80O hotl1 .54 .54 1.00O hotl2 .55 .55 1.02O hotl3 .50 .50 .92

What!! The Lavaan unstandardized loadings are different because the programmers of Lavaan used a different identification choice. More on this below.

In Amos and Rcmdr CFA: Factor variances are fixed at 1 and ALL loadings are estimated.

In Lavaan’s default: Factor variances are estimated and 1 loading for each factor is fixed at 1.

Both methods yield the same standardized loadings. Since we’re usually most interested in the standardized loadings, the differences between Unstandardized loadings are not of any consequence.


Comparison of EFA and CFA factor correlations . . .

Amos RCMDR lavaanFactors EFA CFA CFA CFAE~A .030 .16 .158 .159E~C -.061 -.02 -.024 -.024E~S .074 -.07 -.069 -.069E~O .218 .24 .241 .241A~C .157 .16 .159 .159A~S -.158 -.11 -.112 -.112A~O .054 .09 .094 .094C~S .179 .21 .209 .209C~O .155 .10 .095 .095S~O .314 .34 .337 .337

Correlation of the correlations . . .CFA factor correlations vs EFA

This indicates that in this particular instance, the estimates of factor correlations from a CFA are about the same as the estimates of factor correlations from an EFA although the positive correlations from the CFA tended to be slightly farther from zero than those from the EFA.

Many people would say that the EFA correlations are the gold standard, since the CFA model might be distorted by

forcing all of the cross-loadings to be zero. I would be one of them. CFA, Amos - 26 Printed on 5/19/2023

r = .871

The Identification Problem Skip in 2017, 2018Consider the simple regression model . . .

Quantities which can be computed from the data . .

Mean of the X variable Variance of the X variableMean of the Y variable. Variance of the Y variable.

Correlation of Y with X

Quantities in the model .

Remember that in path diagrams, all the variance in every endogenous variable must be accounted for. For that reason, the path diagram includes a latent “Other factors” or “Error of measurement” or “Residual” variable, labeled “E” in the above diagram..

Mean of X Mean of EVariance of X Variance of E

Intercept of X->Y regressionSlope of X->Y regressionCorrelation of E with Y

Whoops! There are 5 quantities in the data but 7 in the model.

There are too few quantities in the data.

The model is underidentified. – not identified enough - there aren't enough quantities from the data to identify each model value.


Mean, Variance

Mean, Variance

E

YXY = a + b*X

Corr(E,Y)

Note: Mean and variance of Y are not separately identified in the model because they are assumed to be completely determined by Y’s relationship to X and to E.

Dealing with underidentification . . . Skip in 2017, 2018

Solution 10) The mean of E is always assumed to be 0.1) Fix the variance of E to be 1.

So in this regression model, the path diagram will be

In this case, there are 5 quantities in the model that must be estimated – mean of X, variance of X, intercept of equation, slope of equation, and correlation of E with Y. There are also 5 quantities that can be estimated from the observed data.

The model is said to be “just identified” or “completely identified”. This means that every estimable quantity in the model corresponds in some way to one quantity obtained from the data.

Or,

Solution 20) The mean of E is always assumed to be 0.1) Fix covariance of E with Y at 1.

Underidentified models: Cannot be estimated.

Just identified models: Every model quantity is a function of some data quantity. But no parsimony.

Overidentified models: There are more data quantities than model quantities. It is said that you then have “degrees of freedom” in your model. This is good. Relationships are being explained by fewer model quantities than there are data quantities. This is parsimonious – what science is all about.


Y = a + b*X

Y=a+b*X

X Y

E

Mean, Variance

0, 1

1

rEY

0, Variance

Mean, Variance

E

YX

Technical Issue: Identification in CFA models –

For mathematical reasons, we have to make a few assumptions about which values are to be estimated in CFA models. Here’s a typical CFA two-factor model.

Assumption 1. Insuring that the “Residuals” part of the model – that involving the “E”s – is identified.

We always assume all “E” means = 0.

Solution 1. Fix all “E” variances to 1. These two values, 0 and 1, are often put near a latent variable to remind the reader that these assumptions have been made.

or

Solution 2. Fix all E O loadings to 1 and estimate variances of “E”s.


E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 11 I recommend

this.It’s the

default in most

programs.

Loading

Loading

Loading

Loading

Loading

Loading

0, Variance6

0, Variance5

0, Variance4

0, Variance3

0, Variance2

0, Variance1

0,1

0,1

0,1

0,1

0,1

0,1E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

1

1

1

1

1

Assumption 2. Insuring that the Factors part of the CFA is identified

Solution 1. Fix one of the loadings for each factor at 1 and estimate all factor variances

Or

Solution 2. Fix the variance of each factor at 1 and estimate all factor loadings.

The main point for us – what the factor analysis gives us.

The above models tell us that the variation in 6 observed variables – Obs1, Obs2, Obs3, Obs4, Obs5, Obs5 - is due to variation in just two internal characteristics – F1 and F2.

So we have explained why there is variation in the observed variables – because of variation in F1 and F2.

We have also explained why the variation in Obs1 to Obs3 is unrelated to the variation in Obs4 to Obs6 – because F1 and F2 are uncorrelated.


1

1

E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1 I recommend this, although some

examples below will use the above

method.

The item whose loading is fixed is called the reference item.

Variance

Variance 1

1

E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

Examples

1. Fixing all variances.

2. Fixing residual loadings but Factor variances

3. Fixing residual loadings and factor loadings.


1

1

1

1

11

E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 11

1

My favorite.

Default for many programs.

1

1

1

1

1

1

1

1

E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

1

1

1

1

1

1

1

1

E6

E5

E4

E3

E2

E1

F1

F2

Obs 6

Obs 5

Obs 4

Obs 3

Obs 2

Obs 1

Factors vs Summated Scales

Recall that X = T + E as was discussed briefly in PSY 5130

An observed score is comprised of both the True score and error of measurement.

This means that X – the variable we’re measuring – is partly error. There will be some variation in X from measurement to measurement that has nothing to do with our theory - it’s just random variation, clouding our results.

The basic argument for using latent variables is that the relations between latent variables are closer to the “true score” relationships than can be found in any existing analysis. This is because when constructs are represented using latent variables, the random error that is part of every X is removed.

If we compute the average of 10 Conscientiousness items to form a scale score, for example, that scale score includes the errors associated with each of the items averaged.

Here’s a path diagram representing a scale score . . .Assume the items are Conscientiousness items.


Item X3

Item X4

Item X5

Item X6

Item X7

e3

e4

e5

e6

e1

e7

e8

e2

C+Junk

Item X1

Item X2

A scale score contains everything – pure content plus error of measurement.

So any correlation involving a scale score is contaminated by the error of measurement contained in the scale score.

e1q

ee

ee2

e4

Item X8

Item X9

Item X10

e9

e10

e ee3

GPA

But if we create a C factor, the factor represents only the C that is present across all the items, and none of the error that also contaminates the item.

The errors affecting the items are treated separately, rather than being lumped into the scale score.

The result is that the latent variable, C, in the diagram below is a purer estimate of conscientiousness than would be a scale score. Its correlations with other variables will not be contaminated by errors of measurement.

From Schmidt, F. (2011). A theory of sex differences in technical aptitude and some supporting evidence. Perspectives on Psychological Science, 6, 560-573.

“Prediction 3 was examined at both the observed and the construct levels. That is, both observed score and true score regressions were examined. However, from the point of view of theory testing, the true score regressions provide a better test of the theoretical predictions, because they depict processes operating at the level of the actual constructs of interest, independent of the distortions created by measurement error.”


Item 7

Item 8

Item 9

Item 10

e

e

e

e

Item 6e

Item 1e

C

Item 3

Item 4

Item 5

e

GPAe

e

Item 2e

So, what do we get from factor analyses?

1. Reduction in dimensionality

As discussed in the first lecture on factor analysis – the first purpose is to reduce the complexity of the problem, allowing us to focus on a few factors rather than multiple observed variables.

2. Measurement not contaminated by random measurement error

Factors estimated from appropriate models give us estimates of characteristics will be unaffected by measurement error.

The way to access a factor’s values for individual persons is is through factor scores. Right now, only Mplus makes that easy to do. I expect more programs to provide such capabilities soon.

4. Measurement controlling for other nonrandom constructs.

Measurement of a construct using factor analysis allows you to partial out the effects of other constructs in the measurement process. More on this later.

3. Purer relations

As stated above, relationships among latent variables are free from the “noise” of errors of measurement, so if two characteristics are related, their factor correlations will be farther from 0 than will be their scale correlations.

That is – assess the relationship with scale scores. Assess the same relationship using a latent variable model. The r from the latent variable analysis will usually be larger than the r from the analysis involving scale scores.

4. Sanity and advancement of the science

The sometimes random-seeming mish-mash of correlations will be a little less random-seeming when the effects of errors of measurement on relationships and on measurement of nonrandom characteristics are taken out.


Example illustrating the differences between scale score correlations and latent variable correlations.

From Biderman, M. D., McAbee, S. T., Chen, Z., & Nguyen, N. T. (2018). Assessing the Evaluative Content of Personality Questionnaires Using Bifactor Models. Journal of Personality Assessment, 100(4), 375-388, DOI: 10.1080/00223891.2017.1406362

The correlations are those between three different measures of affect. For each measure, the correlations between scale scores on the left and by a factors on the right.

Table 3Correlations of Measures of Affect

Summated Scales Latent Variables

RSE PANAS Dep. RSE PANAS Dep.Administered with the NEO-FFI-31

NEO-FFI-3

RSE 1 1

PANAS .81 1 1.00 1

Depression - -.76 -.79 1 -.84 -1.00 1

Administered with HEXACO-PI-R2

HEXACO-PI-R

RSE 1. 1

PANAS . .74 1. .80 1

Depression -.78 -.73 1 -.87 -.82 1

Note. 1N=317, 2N=788. Dep. = depression.

As an aside, note that even the scale scores were very highly correlated, raising the issue of whether the three scales – RSE, PANAS, and Dep - might all be measuring the same affective characteristic.


Documents

Diagrams Representing Relationships between …€¦ · Web viewA variable, i.e., characteristic, presumed to exist, but whose values are NOT observable. A Factor in Factor Analysis