An Introduction to Factor Analysis

Preview:

DESCRIPTION

An Introduction to Factor Analysis. Reducing variables and/or detecting underlying structures. Books you’ll never see. Uses. Data reduction. 24 actual variables. Factor 1. Factor 2. Two latent variables. Uses. Create composites/scales for psychometric instruments. Depression. Anxiety. - PowerPoint PPT Presentation

Citation preview

An Introduction to Factor Analysis

Reducing variables and/or detecting underlying structures

Books you’ll never see . . .

Uses

• Data reduction

Factor 1Factor 2

24 actual variables

Two latent variables

Uses

• Create composites/scales for psychometric instruments

DepressionAnxiety

Uses

• Validate composites/scales for psychometric instruments

DepressionAnxiety

Summary of uses• Also used in the

development or exploration of questionnaires or other psychometric instruments.

• Factor analytic techniques are most commonly used to reduce many items into a more usable number of factors. This way, the more simplified data can be used more easily in research.

Latent variables

A metaphor

An example of common variance using bivariate relationships

• I measure a sample of kindergarten children’s ability to recognize the sound(s) at the beginning of words, e.g., /k/ in “cat”

• I also measure the children’s ability to segment (break apart) sounds

e.g., “cat” = /k/ /a/ /t/

• I correlate these two measures

Beginning letter sounds

Ph

on

eme

Seg

men

tati

on

Not useful when A vast array of

variables, with no theoretical association are forced into analysis just to see what turns up

The variables have inadequate reliability. This lack of stability of measurement affects the meaningfulness of the derived factors.

Approaches to Factor Analytic Techniques

Exploratory• Mathematically driven

technique• Seeks to identify the

underlying structure of a set of items or variables

• Use of scholarly intuition to figure out what the factors mean

Confirmatory• Starts with a theory of

what you expect to confirm (a priori)

• Do the items load as you expected on the factors that you predicted?

• Much more involved Structural Equation Modeling approach—test of model fit

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Methodological Considerations

1.Selection of variables

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Assumptions and Requirements of Factor Analytic Techniques

• More than one variable involved• Sample acquired through random selection• Robust bivariate relationships among variables• Variables are measured using either interval or

ratio (or ordinal—quasi-interval?) level data• Data approximate a normal distributions

(multivariate normality is also nice)• Relationships among variables are linear• Variables are measured reliably • No multicolinearity (e.g., bivariate r above 0.90)• Few missing observations• “Large” number of observations

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Size of sampleWhat is a reasonable sample size? How many

observations do you need?• Old school: Ten observations per planned

extracted factor (with a minimum of 100 recommended)

• “More is better” rule. Similar reasoning as other parametric statistical techniques, but less can be okay under some circumstances.

• Recently, it is more recognized that smaller samples can be reasonably factor analyzed, but this is something still hotly debated.

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Reliability of measures• Factor analysis is a correlational technique

(multiple regression)

• Low reliabilities attenuate correlations

• Low reliabilities introduce “noise” and obscure “signal” for the factors you are trying to detect and extract

Researcher as Quality Control

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Appropriateness of Factor Analysis• Test development and instrument validation

– Create composites/sub-scales for psychometric instruments

– Detect underlying structures within• Construct validity • Evaluation of a theory

• Data reduction– Reduce multiple variables to a smaller group, while

maintaining the diversity of information offered.– Demonstrate that multiple instruments test the same

thing

demonstrate that items load on one factor, or no factors, or multiple factors

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

Partitioning Variance

1. Variance common to other variables

2. Variance specific to that variable

3. Random measurement error

Most common methods of extracting factors?

Common Factor Analysis (CFA)

Assumption: The factors explain the correlations among the variables (variance in common)

Finds common variance among many items, groups it, and then it must be appropriately labeled

Goal: To find the fewest number of factors that account for the relationships among variables

Kahn 2006

Unique variance

(item)

Common variance

Unique variance

(item)

Unique variance

(item)

CFA considers this

variance

DeCoster (1998) Overview of Factor Analysis

Principal Components Analysis (PCA)Assumption: Components

explain the variance in common among the variables and the amount of unique variance (item & error) present

Goal: To find the fewest components that account for the relationships among variables

Unique variance

(item+error)

Unique variance

(item+error)

Unique variance

(item+error)

Comparisons

Common Factor Analysis

• Seeks the factors that account for the common variance among the variables

• Used for Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA)

• Easier to generalize to other samples/populations since the unique and error variance of items isn’t considered

• Most often used to detect underlying structures among variables.

Principal Components Analysis

• Seeks factors that account for all of the common and other variance among the variables

• Harder to generalize since other sources of variance (that are item specific and not shared) are included in the model

• Most often used for data reduction to use in research

Factor Analytic TechniquesItem 1

Item 4

Item 5

Item 8

Item 7

Factor 1

Item 2

Item 3

Item 6

Item 10

Item 9

Factor 2

Latent Variables

(unobserved)

What factors exist among the variables?

To what degree are the variables (items) related to the factors that were extracted?

FACTOR LOADINGS

Exploratory questions:

Kahn 2006

unique

unique

unique

unique

unique

unique

unique

unique

unique

unique

Observed variables

Common Factor Analysis• CFA takes into account shared (common) and

item specific variance and uses the squared multiple correlation (R squared) as the measure of communality.

• Communality is the variance in one variable that is shared with the other variables.

• The factors extracted by CFA, therefore, explain the shared variance common to more than one variable.

Common Factor Analysis1. Variance common to other variables

Multicultural Counseling Inventory—Item 6:

“I include the facts of age, gender roles, and socioeconomic status in my understanding of different minority cultures.”

The measured overlap (R square) between this item and the other items on the MCI is the communality.

Common Factor Analysis

Partitions variance for that variable, that is in common with other variables. How?

Uses Multiple Regression.

a. Use each item as an outcome in MR

b. Use all other items as predictors

c. Finds the communality among all of the variables, relative to one another

Common Factor Analysis

Predictors:

Item 2

Item 3

Item 4

Item 5

Item 6

Item 7

Item 8

Item 9

Item 10

Outcome:

Item 1

The R square is the average shared variance for that item with the other items

Item 1

Predictors:

Item 1

Item 3

Item 4

Item 5

Item 6

Item 7

Item 8

Item 9

Item 10

Outcome:

Item 2

The average R square is the average shared variance for that item with the other items

Common Factor Analysis

Item 2

Predictors:

Item 1

Item 2

Item 4

Item 5

Item 6

Item 7

Item 8

Item 9

Item 10

Outcome:

Item 3

The average R square is the average shared variance for that item with the other items

Common Factor Analysis

Item 3

How is communality reported with CFA?

Item 1 Item 2 Item 3 Item 4 Item 5 Item

Item 1 .76

Item 2 .60 .56

Item 3 .43 .76 .87

Item 4 .34 .45 .64 .56

Item 5 .33 .32 .34 .65 .52

Item 6 .82 .81 .45 .57 .33 .41

Squared multiple correlations (R square) are on the diagonal of the correlation matrix

What makes a good factor?

• It is consistent with the literature regarding past investigations of variable relationships

• It is easy to understand and interpret

• It adheres to the “simple structure” model

Principal Component Analysis

Data reduction

Principal Component AnalysisItem 1

Item 4

Item 5

Item 8

Item 7

Component 1

Item 2

Item 3

Item 6

Item 10

Item 9

Component 2

How many components are there that can account for

all or most of the information contained in the original data?

Kahn 2006

unique

unique

unique

unique

unique

unique

unique

unique

unique

unique

How is communality reported with PCA?

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

Item 1 1.0

Item 2 .71 1.0

Item 3 .62 .76 1.0

Item 4 .34 .45 .64 1.0

Item 5 .33 .32 .34 .65 1.0

Item 6 .82 .81 .45 .57 .33 1.0

CFA vs. PCA• Common factor analysis and principal

components analysis often yield similar results when sample sizes are large and/or if item communalities are large.

• Common factor analysis is preferred in situations in which these criteria are not met, especially when the researcher wishes to better understand the latent variables that underlie a mass of items.

Factor Analytic Family of Techniques

Metaphors for extraction of factors/components

• With each extraction of a component, less and less variance is unaccounted for.

12

3 4 5 6 7 8

Factor Analysis MetaphorITEM POOL: Variance-covariance matrix for an instrument Extracts the

shared variation only (i.e., plusses)

First factor+ + + + - - + + +

+ - - + - - + + +

+ + + + - - + + +

+ - - + - - + - -

+ + + + + + + + + + + + + + +

+ + +

+ + +

- -

+ - - + - -

- -

+ - - + - - + - - + + +

+ + +

Extracts the shared variation only (i.e., plusses)

+ + +

+

+

+

+ + + +

Second factorITEM POOL: There is still shared variance left, but it is different than the first batch

The Principle of Parsimony

• Goal: We often want to use the smallest number of separate variables to convey the most information about the relationships among constructs.

“Less is more”

Kahn 2006

Methodological Considerations1. Selection of variables

2. Size of sample

3. Reliability of measures

4. Appropriateness of using Factor Analytic techniques (given the goal of the research)

5. Choice of method (how to extract the factors)

6. How many factors to retain?

7. Methods of rotation (to ease interpretability)

Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.

How many factors to retain?If you keep letting the program extract

factors, it will extract as many factors as there are items.

So how do you decide how many factors to extract?

Bryant & Yarnold (1995). Principal-Components and Factor Analysis from Grimm & Yarnold’s (Eds.) Reading and Understanding Multivariate Statistics

You want the fewest factors necessary to account for the most variance.

Factor Analytic techniques will give you as many factors as you want (even if they’re complete nonsense). The aim is to find the real factors that are consistent with the theoretical structure, not just factors that pop up and have no logical explanation.

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

How many factors to retain?

A priori criterion

• Replication criterion

• Percentage criterion

Stopping rules

• Kaiser rule

• Catell’s scree plot

• Parallel analysis

Bryant & Yarnold (1995). Principal-Components and Factor Analysis from Grimm & Yarnold’s (Eds.) Reading and Understanding Multivariate Statistics

A priori criterion1. When you are replicating research and

you want to use the same number of factors to retain as previous researchers.

2. You decide a cut-off point, based on some theoretical rationale (e.g., retain factors until 80% of the variance is explained by the extracted factors).

Eigenvalues

The eigenvalue is the variance in every variable that is accounted for by the factor in question.

The sum of all eigenvalues = number of variables/items in component analysis

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

How many factors to retain?Kaiser criterion - Retain all

factors with an Eigenvalue greater than 1.0)

This sets the limit so that a component must account for at least as much variance as a single variable (to be considered useful).

Kahn 2006

(For CFA, which SPSS calls principal axis factoring, this would be “factor” instead of “component”)

How many factors to retain?Catell’s scree test: Retain all

factors with a big drop (change in slope). Can be combined with the Kaiser criterion (Factors with an eigenvalue greater than 1.0)

This includes the limit so that a factor must show that it accounts for a chunk of unique variance that is more than the variance of a single item.

Parallel Analysis

• You generate a scree plot (with eigenvalues) based on random data that uses the same number of variables (items) and the same number of cases.

• Retain the factors with eigenvalues higher than the random eignenvalues.

• Not an option in SPSS

Kahn 2006

Factor Rotation

Obtaining a clearer pattern of factor loadings

The Goal of Rotating Factors

To create high factor loadings for each item on one factor

And create low factors loadings for all other factors

THIS COMBINATION OF CHARACTERISTICS IS REFERRED TO AS THE SIMPLE STRUCTURE.

IT MAKES THE FACTORS MORE INTERPRETABLE

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

Factor Structure Coefficients• These are correlations between the item and it’s

associated factor.

• The simple structure dictates that factor coefficients are best if they are very high (in reference to their own factor) and very low (in reference to any other retained factor).

• Rotating factors will change their structure coefficients, thus better approximating the simple structure being sought.

Thurston’s Rule

• Good items (variables) should only load onto one factor

• Items should load on that one factor at least a magnitude of 0.30.

• The item should not have an eigenvalue of less than 1.0

Item 1

Item 2

Item 3

Item 5

Item 4

Factor 1

Item 7

Item 8

Item 6

Item 1

Item 3

Item 4

Item 7

Item 2

Item 5

Item 6

Item 8

Factor 2

Distillation

Kirby, J.R., Parrila, R., & Pfeiffer, S. (2003). Naming speed and phonological awareness as predictors of reading development. Journal of Educational Psychology, 93(3), 453-464.

Kirby, J.R., Parrila, R., & Pfeiffer, S. (2003). Naming speed and phonological awareness as predictors of reading development. Journal of Educational Psychology, 93(3), 453-464.

.96

.90

.77

.63

.90

.75

.47

Picture naming

Color naming

Sound isolation

Phoneme elision

Blending onset-rime

Blending phonemes

Rapid automatized

naming

Phonological awareness

-.10

-.05

.06

.15

.03

-.05

Rapid automatized

naming

Blending phonemes

Blending onset-rime

Phoneme elision

Sound isolation

Color naming

Picture naming

Phonological awareness

Factor 1

Factor 2

Factor 1

Factor 2

Common rotations

Orthogonal - factors are at 90 degree angles (i.e., uncorrelated)

• *Varimax

• Quartimax

• Equimax

*most popular

Oblique-Factors maybe correlated with each other.

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

Factor Extraction

Because the first factor extracted accounts for the most variance among the variables, the next factor extracted will capture variance not accounted for by the first factor. This helps the latent variables be “orthogonal,” meaning that the extracted factors are generally uncorrelated with each other.

Orthogonal Rotations

Varimax: Most common. Maximizes loadings on one factor while minimizing loadings on other factors.

Quartimax: Uncommon. Maximizes factor loading on the first factor only.

Equimax: Also less common. Combines other techniques and because of this, is more difficult to interpret than the other two options.

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

Oblique rotationsNot used frequently but should be when factors

are correlated.

Promax is the most popular of the oblique methods• First rotates orthogonally• Then followed by oblique rotation• Minimizes small loadings• Simple structure is best approximated

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

How to decide?

• You want what will give you the most interpretable result, with the simplest solution, consistent with an underlying theoretical structure.

• You can use different rotational techniques and compare results. Similar results strengthen confidence in the outcome.

Ferketich & Muller (1999) Readings in Research Methodology, Second Edition

How to clarify factor loadings using rotation

Item 1Item 2

Item 3

Factor 1 axis

Factor’s 2 axis

Item 4

Rotation

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Item 4

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Rotation

Item 4

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Rotation

Item 4

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Rotation

Item 4

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Rotation

Item 4

Item 1Item 2

Factor 1 axis

Factor’s 2 axis

Rotation

Item 4

Factor Rotation

Item 1Item 2

Factor 1axis

Item 3

Rot

ated

Fac

tor 1

Factor 2axis

Rotated Factor 2Item 4

• Factor loading coefficients define the eigenvector. The factor loading coefficient represents the correlation between the item and the eigenvector

Eigenvectors

Variables 1 2

1 .62 .52

2 .54 .25

3 .25 .59

4 .39 .66

5 .35 .68

Before orthogonal rotation

After orthogonal rotation

• Factor loading coefficients define the eigenvector. The factor loading coefficient represents the correlation between the item and the eigenvector

Eigenvectors

Variables 1 2

1 .65 .45

2 .62 .09

3 .05 .694 .02 .685 .10 .82

Factor coefficients: before and after

Eigenvectors

Variables 1 2

1 .65 .45

2 .62 .09

3 .05 .69

4 .02 .68

5 .10 .82

Eigenvectors

Variables 1 2

1 .62 .52

2 .54 .25

3 .25 .59

4 .39 .66

5 .35 .68

Uses of Factor Analytic Techniques

• All of the techniques associated with creating factors from many variables are sample specific; however, the better the quality of your sample (size, representativeness, etc.), the more likely your results will generalize to other samples, and theoretically, to the population of interest.

Floyd & Widaman (1995)

“Thus, common factor analysis can provide valuable insights into the multivariate structure of a measuring instrument, isolating the theoretical constructs [i.e., factors] whose effects are reflected in responses on the instrument.” (p. 287)

Cross Validation

• Randomly divide your sample (2/3, 1/3)

• Try to replicate factor solutions across groups

• Explore for part of the sample, then confirm with the other portion

EFA vs. CFA

Exploratory • Find and retain

factors (no test of significance, per se)

Confirmatory• See how well the

constructed model fits the data

Chi-square goodness of fit test

Confirmatory Factor Analysis and Model Fit

The researcher specifies in advance (predicts) how many factors will be found and which items should load on which factors.

Factor 1

Factor 2Factor 3

Factor 4

Links and Resources

• http://www.siu.edu/~epse1/pohlmann/factglos/

Recommended