33
Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Factor analysis Caroline van Baal March 3 rd 2004, Boulder

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Factor analysis

Caroline van Baal

March 3rd 2004, Boulder

Page 2: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Phenotypic Factor Analysis

• (Approximate) description of the relations between different variables– Compare to Cholesky decomposition

• Testing of hypotheses on relations between different variables by comparing different (nested) models– How many underlying factors?

Page 3: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Factor analysis and related methods

• Data reduction– Consider 6 variables:– Height, weight, arm length, leg length,

verbal IQ, performal IQ– You expect the first 4 to be correlated, and

the last 2 to be correlated, but do you expect high correlations between the first 4 and the last 2?

Page 4: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Data analysis in non-experimental designs using latent

constructs

• Principal Components Analysis

• Triangular Decomposition (Cholesky)

• Exploratory Factor Analysis

• Confirmatory Factor Analysis

• Structural Equation Models

Page 5: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Exploratory Factor Analysis

• Account for covariances among observed variables in terms of a smaller number of latent, common factors

• Includes error components for each variable• x = P * f + u• x = observed variables• f = latent factors• u = unique factors• P = matrix of factor loadings

Page 6: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

Factor 1IQ, “g”

1

Page 7: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

Factor 1verbal

Factor 2performal

1 1

Page 8: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

EFA equations

• C = P * D * P’ + U * U’• C = observed covariance matrix

• Nvar by nvar, symmetric

• P = factor loadings• Nvar by nfac, full

• D = correlations between factors• Nfac by nfac, standardized

• U = specific influences, errors• Nvar by nvar, diagonal

Page 9: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Exploratory factor analysis

• No prior assumption on number of factors

• All variables load on all latent factors

• Factors are either all correlated or all uncorrelated

• Unique factors are uncorrelated

• Underidentification

Page 10: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

Factor 1verbal

Factor 2performal

Fix to 0

1 1

Page 11: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Confirmatory factor analysis• An initial model is constructed, because:

– its elements are described by a theoretical process

– its elements have been obtained from a previous analysis in another sample

• The model has a specific number of factors• Variables do not have to load on all factors• Measurement errors may correlate• Some latent factors may be correlated,

while others are not

Page 12: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

Factor 1verbal

Factor 2performal

1 1

Page 13: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

Factor 1verbal

Factor 2performal

1 1

Page 14: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

VC FD PO

Page 15: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

VC FD PO

Page 16: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

CFA equations

• x = P * f + u• x = observed variables, f = latent factors• u = unique factors, P = factor loadings• C = P * D * P’ + U * U’• C = observed covariance matrix• P = factor loadings• D = correlations between factors• U = diagonal matrix of errors

Page 17: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Structural equations models

• The factor model x = P * f + u is sometimes referred to as the measurement model

• The relations between latent factors can also be modeled

• This is done in the covariance structure model, or the structural equations model

• Higher order factor models

Page 18: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA

VC FD PO

2nd order Factor“g”

F3F2F1

• Second order factor model: C = P*(A*I*A’+B*B')*P' + U*U’

Page 19: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Five steps characterize structural equation models

• Model specification• Identification

– E.g., if a factor loads on 2 variables only, multiple solutions are possible, and the factor loadings have to be equated

• Estimation of parameters• Testing of goodness of fit• Respecification

• K.A. Bollen & J. Scott Long: Testing Structural Equation Models, 1993, Sage Publications

Page 20: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Practice!• IQ and brain volumes (MRI)

• 3 brain volumes– Total cerebellum, Grey matter, White matter

• 2 IQ subtests– Calculation, Letters / numbers

• Brain and IQ factors are correlated

• Datafile: mri-IQ-all-twinA-5.dat

Page 21: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Script: phenofact.mx

• BEGIN MATRICES ;• P FULL NVAR NFACT free ; ! factor loadings• D STAND NFACT NFACT !free ;! correlations between factors• U DIAG NVAR NVAR free ; ! subtest specific

influences• M Full 1 NVAR free ; ! means • END MATRICES ;

• BEGIN ALGEBRA;• C= P*D*P' +U*U' ; ! variance covariance matrix• END ALGEBRA;

• Means M /• Covariances C /

Page 22: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

• in exploratory factor analysis, if nfact = 2, one of the factor loadings has to be fixed to 0 to make it an identified model

• fix P 1 2

• In confirmatory factor analysis, specify a brain and an IQ factor• SPECIFY P• 101 0• 102 0• 103 0• 0 204• 0 205• 0 206

• (if a factor loads on 2 variables only, it is not possible to estimate both factor loadings. Equate them, or fix one of them to 1)

Page 23: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Phenotypic Correlations: MRI-IQ, Dutch twins (A), n=111/296 pairs

brain

cereb

brain

grey

brain

white

IQ

calc

IQ

L/n

Cerebellum 1

Grey .63 1

White .61 .55 1

calculation .23 .25 .26 1

Letter/numb. .30 .19 .19 .46 1

Page 24: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

• What is the fit of a 1 factor model?– C = P * P’ + U*U’, P = 5x1 full, U = 5x5 diagonal

• What is the fit of a 2 factor model?– Same, P = 5x2 full with 1 factor loading fixed to 0– (Reducion: fix first 3 factor loadings of factor 2 to 0)

• Data suggest 2 latent factors: a brain (first 3) and an IQ factor (last 2): what is the evidence for this model?– Same, P = 5x2 full with 5 factor loadings fixed to 0

• Can the 2 factor model be improved by allowing a correlation between these 2 factors?– C = P * D * P’ + U*U’, P = 5x2 full matrix (5 fixed),

D = stand 2x2 matrix, U = 5x5 diagonal matrix

Page 25: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Principal Components Analysis

• SPSS, SAS, Mx (functions \eval, \evec)

• Transformation of the data, not a model

• Is used to reduce a large set of correlated observed variables (xi) to (a smaller number of) uncorrelated (orthogonal) components (ci)

• xi is a linear function of ci

Page 26: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

PCA path diagram

• D

• P

• S = observed covariances = P * D * P’

x1 x2 x3 x4 x5

c1 c2 c3 c4 c5

Page 27: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

PCA equations

• Covariance matrix qSq = qPq * qDq * qPq’

• P = full q by q matrix of eigenvectors• D = diagonal matrix of eigenvalues• P is orthogonal: P * P’ = I (identity)

Criteria for number of factors• Kaiser criterion, scree plot, %var• Important: models not identified!

x1 x2 x3 x4 x5

c1 c2 c3 c4 c5

Page 28: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Correlations: satisfaction, n=100

Var 1

work

Var 2

work

Var 3

work

Var 4

home

Var 5

home

Var 6

home

Var 1 1

Var 2 .65 1

Var 3 .65 .73 1

Var 4 .14 .14 .16 1

Var 5 .15 .18 .24 .66 1

Var 6 .14 .24 .25 .59 .73 1

Page 29: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

++++ ++

00

0

0

0

0++

++++

work home

Var 1 Var 2 Var 3 Var 4 Var 5 Var 6

Page 30: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

PCA: Factor loadings(eigenvalues 2.89 & 1.79)

Factor 1 Factor 2

Var 1 (work) .65 .56

Var 2 (work) .72 .54

Var 3 (work) .74 .51

Var 4 (home) .63 -.56

Var 5 (home) .71 -.57

Var 6 (home) .71 -.53

Page 31: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Triangular decomposition (Cholesky)

x1 x2 x3 x4 x5

y1 y2 y3 y4 y5

1 operationalization of all PCA outcomes

Model is just identified! Model is saturated (df=0)

1 1 1 1 1

Page 32: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Triangular decomposition

• S = Q * Q’ ( = P# * P# ‘, where P# is P*D)•

5Q5 = f11 0 0 0 0f21 f22 0 0 0f31 f32 f33 0 0f41 f42 f43 f44 0f51 f52 f53 f54 f55

• Q is a lower matrix• This is not a model! This is a transformation of the

observed matrix S. Fully determinate!

Page 33: Factor analysis Caroline van Baal March 3 rd 2004, Boulder

Saturated model, # latent factorsscript: phenochol.mx

• BEGIN MATRICES ;• P LOWER NVAR NVAR free ; ! factor loadings• M FULL 1 NVAR free ; ! means • END MATRICES ;

• BEGIN ALGEBRA;• C= Q*Q' ; ! variance covariance matrix• K=\stnd(C) ; ! correlation matrix• X=\eval(K) ; ! eigen values (i.e., variance of latent factors)• Y=\evec(K) ; ! eigenvectors (i.e., regression coefficients)• END ALGEBRA;

• Means M /• Covariances C /