38
Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Embed Size (px)

Citation preview

Page 1: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Factor Analysis with SAS

Karl L. Wuensch

Dept of Psychology

East Carolina University

Page 2: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

What is a Common Factor?

• It is an abstraction, a hypothetical construct that affects at least two of our measurement variables.

• We want to estimate the common factors that contribute to the variance in our variables.

• Is this an act of discovery or an act of invention?

Page 3: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

What is a Unique Factor?

• It is a factor that contributes to the variance in only one variable.

• There is one unique factor for each variable.

• The unique factors are unrelated to one another and unrelated to the common factors.

• We want to exclude these unique factors from our solution.

Page 4: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Iterated Principal Factors Analysis

• The most common type of FA.• Also known as principal axis FA.• We eliminate the unique variance by

replacing, on the main diagonal of the correlation matrix, 1’s with estimates of communalities.

• Initial estimate of communality = R2 between one variable and all others.

Page 5: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

FactBeer.sas

• Download FactBeer.sas from http://core.ecu.edu/psyc/wuenschk/SAS/SAS-Programs.htm .

• Bring it into SAS.• Edit the FILE statement in the second

DATA step so that it points to a folder on your computer.

• Run the program. Look at the output

Page 6: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Look at the Initial Communalities

• Page 1 of the output.• For each variable, these are the R2

predicting that variable from all others.• They sum to 5.675.• We have eliminated 7 – 5.675 = 1.325

units of unique variance.

Page 7: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Iterate!

• Using the estimated communalities, obtain a solution.

• Take the communalities from the first solution and insert them into the main diagonal of the correlation matrix.

• Solve again.• Take communalities from this second

solution and insert into correlation matrix.

Page 8: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Solve again.• Repeat this, over and over, until the

changes in communalities from one iteration to the next are trivial.

• Our final communalities sum to 5.6.• After excluding 1.4 units of unique

variance, we have extracted 5.6 units of common variance.

• That is 5.6 / 7 = 80% of the total variance in our seven variables.

Page 9: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Final Communalities

Page 10: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• We have packaged those 5.6 units of common variance into two factors (page 4 of output):

Page 11: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Reproduced and Residual Correlation Matrices

• Correlations between variables result from their sharing common underlying factors.

• Try to reproduce the original correlation matrix from the correlations between factors and variables (the loadings).

• The difference between the reproduced correlation matrix and the original correlation matrix is the residual matrix.

Page 12: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• We want these residuals to be small, and they are.

• Look at “Residual Correlations With Uniqueness on the Diagonal” on page 2 of the output

• Uniqueness = 1 – Commonality.• The Root Mean Squares show no

problems with any of the variables.• Reputation does have high uniqueness.

Page 13: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Partial Correlations Controlling Factors

• Tells us for each pair of variables how much variance they share that has NOT been captured by the factors.

• See page 3 of the output.• Note that Size and Color share a good bit

of variance that was not captured by the factors.

Page 14: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Nonorthogonal (Oblique) Rotation

• The axes will not be perpendicular, the factors will be correlated with one another.

• the factor loadings (in the pattern matrix) will no longer be equal to the correlation between each factor and each variable.

• They will still equal the beta weights, the A’s in

jmmjjjj UFAFAFAX 2211

Page 15: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Oblique solutions make me uncomfortable.• but I did one just for you – • a Promax rotation.• First a Varimax rotation is performed.• Then the axes are rotated obliquely.• See the output in the handout, about page

5.

Page 16: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Exact Factor Scores

• You can compute, for each subject, estimated factor scores.

• SAS gives you the standardized scoring coefficients used in computing factor scores.

• You must use NFACTORS= to specify the number of factors to retain.

• See page 5 of the output.

Page 17: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Standardized Scoring Coefficients

Page 18: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Computing Factor Scores

• Multiply each standardized variable score by the corresponding standardized scoring coefficient.

• For our first subject,

Factor 1 = (-.294)(.41) + (.955)(.40) + (-.036)(.22) + (1.057)(-.07) + (.712)(.04) + (1.219)(.03)+ (-1.14)(.01) = 0.23.

• What a pain !

Page 19: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Outputting the Factor Scores

• OUT=name will create a new data set with all of the original variables and with factor scores

• You can PUT the ones you wish to save in a plain text data file.

• Take a look at the FactBeer.dat file which was written to your computer.

• Later we shall use this file for additional analysis.

Page 20: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

R2 of the Variables With Each Factor

• See page 5 of the output.• These statistics are equal to the variance

of the factor scores.

Page 21: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

True Factor Scores

• Now imagine that there exist two true factors which are the cause of our observed scores.

• The factors that we have created are just estimates of these true factors.

• Are they good estimates? Are they well related to the observed variables?

• With values of .97 and .95, I’d say yes.

Page 22: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Unit-Weighted Factor Scores

• Define subscale 1 as simple sum or mean of scores on all items loading well (> .4) on Factor 1.

• Likewise for Factor 2, etc.• Suzie Cue’s answers are• Color, Taste, Aroma, Size, Alcohol, Cost, Reputation

• 80, 100, 40, 30, 75, 60, 10• Aesthetic Quality = 80+100+40-10 = 210• Cheap Drunk = 30+75+60-10 = 155

Page 23: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• It may be better to use factor scoring coefficients (rather than loadings) to determine unit weights.

• Grice (2001) evaluated several techniques and found the best to be assigning a unit weight of 1 to each variable that has a scoring coefficient at least 1/3 as large as the largest for that factor.

• Using this rule, we would not include Reputation on either subscale and would drop Cost from the second subscale.

Page 24: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Item Analysisand Cronbach’s Alpha

• Are our subscales reliable?• Test-Retest reliability• Cronbach’s Alpha – internal consistency

– Mean split-half reliability– With correction for attenuation– Is a conservative estimate of reliability

Page 25: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

AQ = Color + Taste + Aroma – Reputation

• Is the AQ subscale reliable?• Must negatively weight Reputation prior to

item analysis.• DATA alpha; SET drinkme;NegRep = -1*reputat;

• Look at the output, page 6.

Page 26: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University
Page 27: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Shoot for an alpha of at least .70 for research instruments.

• Our alpha, .88, is excellent.• But reputation has a low item-total

correlation.• Removing it would raise alpha to .96.

Page 28: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Comparing Two Groups’ Factor Structure

• Eyeball Test– Same number of well defined factors in both

groups?– Same variables load well on same factors in

both groups?

Page 29: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Pearson r– Just correlate the loadings for one factor in

one group with those for the corresponding factor in the other group.

– If there are many small loadings, r may be large due to the factors being similar on small loadings despite lack of similarity on the larger loadings.

Page 30: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• CC, Tucker’s coefficient of congruence– Follow the instructions in the document

Comparing Two Groups’ Factor Structures: Pearson r and the Coefficient of Congruence

– CC of .85 to .94 corresponds to similar factors, and .95 to 1 as essentially identical factors.

Page 31: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Cross-Scoring– Obtain scoring coefficients for each group.– For each group, compute factor scores using

coefficients obtained from the analysis for that same group (SG) and using coefficients obtained from the analysis for the other group (OG).

– Correlate SG factor scores with OG factor scores.

Page 32: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Catell’s Salient Similarity Index– Factors (one from one group, one from the

other group) are compared in terms of similarity of loadings.

– Catell’s Salient Similarity Index, s, can be transformed to a p value testing the null that the factors are not related to one another.

– See my document Cattell’s s for details.

Page 33: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Required Number of Subjects and Variables

• Rules of Thumb (not very useful)– 100 or more subjects.– at least 10 times as many subjects as you

have variables.– as many subjects as you can, the more the

better.• It depends – see the references in the

handout.

Page 34: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• Start out with at least 6 variables per expected factor.

• Each factor should have at least 3 variables that load well.

• If loadings are low, need at least 10 variables per factor.

• Need at least as many subjects as variables. The more of each, the better.

• When there are overlapping factors (variables loading well on more than one factor), need more subjects than when structure is simple.

Page 35: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

• If communalities are low, need more subjects.

• If communalities are high (> .6), you can get by with fewer than 100 subjects.

• With moderate communalities (.5), need 100-200 subjects.

• With low communalities and only 3-4 high loadings per factor, need over 300 subjects.

• With low communalities and poorly defined factors, need over 500 subjects.

Page 36: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Use the FactBeer.dat File

• Download Factor-MR-T.sas from my SAS programs page.

• Edit the Infile statement so it correctly points to factbeer.data on your computer.

• Run it.• Look at the output.

Page 37: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

Multiple Regression:Factor Scores as Predictors

• Notice that the factor scores, AesthetQ and CheapDr, are standardized to mean 0, variance 1.

• Notice that they are not correlated with one another.

• We certainly can predict SES will from these two factors !

Page 38: Factor Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University

T-Test on Factor Scores

• We compare two groups of students.• One group is students at Pitt Community

College.• The other is graduate students at ECU.• Notice that they differ significantly on both

factors.