Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
An introduction to the psych package Part I
data entry and data description
William RevelleDepartment of PsychologyNorthwestern University
October 30 2018
Contents
01 Jump starting the psych packagendasha guide for the impatient 302 Psychometric functions are summarized in the second vignette 5
1 Overview of this and related documents 7
2 Getting started 9
3 Basic data analysis 931 Getting the data by using readfile 1032 Data input from the clipboard 1033 Basic descriptive statistics 11
331 Outlier detection using outlier 13332 Basic data cleaning using scrub 13333 Recoding categorical variables into dummy coded variables 13
34 Simple descriptive graphics 15341 Scatter Plot Matrices 15342 Density or violin plots 17343 Means and error bars 20344 Error bars for tabular data 20345 Two dimensional displays of means and errors 24346 Back to back histograms 26347 Correlational structure 28348 Heatmap displays of correlational structure 29
35 Testing correlations 29
1
36 Polychoric tetrachoric polyserial and biserial correlations 35
4 Multilevel modeling 3741 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38
5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4153 Set Correlation 44
6 Converting output to APA style tables using LATEX 47
7 Miscellaneous functions 48
8 Data sets 49
9 Development version and a users guide 51
10 Psychometric Theory 51
11 SessionInfo 52
2
01 Jump starting the psych packagendasha guide for the impatient
You have installed psych (section 2) and you want to use it without reading much moreWhat should you do
1 Activate the psych package
library(psych)
2 Input your data (section 31) There are two ways to do this
bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)
myData lt- readfile() find the appropriate file using your normal operating system
bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command
myData lt- readclipboardtab() if on the clipboard
Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc
3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy
dim(myData) What are the dimensions of the data
describe(myData) or
describeBy(myDatagroups=mygroups) for descriptive statistics by groups
headTail(myData) show the first and last n lines of a file
4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers
pairspanels(myData)
outlier(myData)
5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)
cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA
6 Graph the data with error bars for each variable (section 343)
errorbars(myData)
3
7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix
bull Descriptively (just the values) (section 347)
r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals
bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included
corPlot(r) examine the many options for this function
bull Inferentially (the values the ns and the p values) (section 35)
corrtest(myData)
8 Apply various regression models
Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data
myData lt- satact
colnames(myData) lt- c(mod1med1x1x2y1y2)
setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)
bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip
mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)
We can also find the moderation effect by adding in a product term
bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation
4
effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions
mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50
main=Moderated mediation (not mean centered))
02 Psychometric functions are summarized in the second vignette
Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr
bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)
faparallel(myData)
vss(myData)
bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)
fa(myData)
iclust(myData)
omega(myData)
If you prefer to do a principal components analysis you may use the principalfunction The default is one component
principal(myData)
bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects
alpha(myData) score all of the items as part of one scale
myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))
myscores lt- scoreItems(myKeysmyData) form several scales
myscores show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette as
5
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
36 Polychoric tetrachoric polyserial and biserial correlations 35
4 Multilevel modeling 3741 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38
5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4153 Set Correlation 44
6 Converting output to APA style tables using LATEX 47
7 Miscellaneous functions 48
8 Data sets 49
9 Development version and a users guide 51
10 Psychometric Theory 51
11 SessionInfo 52
2
01 Jump starting the psych packagendasha guide for the impatient
You have installed psych (section 2) and you want to use it without reading much moreWhat should you do
1 Activate the psych package
library(psych)
2 Input your data (section 31) There are two ways to do this
bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)
myData lt- readfile() find the appropriate file using your normal operating system
bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command
myData lt- readclipboardtab() if on the clipboard
Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc
3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy
dim(myData) What are the dimensions of the data
describe(myData) or
describeBy(myDatagroups=mygroups) for descriptive statistics by groups
headTail(myData) show the first and last n lines of a file
4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers
pairspanels(myData)
outlier(myData)
5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)
cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA
6 Graph the data with error bars for each variable (section 343)
errorbars(myData)
3
7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix
bull Descriptively (just the values) (section 347)
r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals
bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included
corPlot(r) examine the many options for this function
bull Inferentially (the values the ns and the p values) (section 35)
corrtest(myData)
8 Apply various regression models
Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data
myData lt- satact
colnames(myData) lt- c(mod1med1x1x2y1y2)
setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)
bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip
mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)
We can also find the moderation effect by adding in a product term
bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation
4
effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions
mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50
main=Moderated mediation (not mean centered))
02 Psychometric functions are summarized in the second vignette
Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr
bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)
faparallel(myData)
vss(myData)
bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)
fa(myData)
iclust(myData)
omega(myData)
If you prefer to do a principal components analysis you may use the principalfunction The default is one component
principal(myData)
bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects
alpha(myData) score all of the items as part of one scale
myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))
myscores lt- scoreItems(myKeysmyData) form several scales
myscores show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette as
5
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
01 Jump starting the psych packagendasha guide for the impatient
You have installed psych (section 2) and you want to use it without reading much moreWhat should you do
1 Activate the psych package
library(psych)
2 Input your data (section 31) There are two ways to do this
bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)
myData lt- readfile() find the appropriate file using your normal operating system
bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command
myData lt- readclipboardtab() if on the clipboard
Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc
3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy
dim(myData) What are the dimensions of the data
describe(myData) or
describeBy(myDatagroups=mygroups) for descriptive statistics by groups
headTail(myData) show the first and last n lines of a file
4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers
pairspanels(myData)
outlier(myData)
5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)
cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA
6 Graph the data with error bars for each variable (section 343)
errorbars(myData)
3
7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix
bull Descriptively (just the values) (section 347)
r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals
bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included
corPlot(r) examine the many options for this function
bull Inferentially (the values the ns and the p values) (section 35)
corrtest(myData)
8 Apply various regression models
Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data
myData lt- satact
colnames(myData) lt- c(mod1med1x1x2y1y2)
setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)
bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip
mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)
We can also find the moderation effect by adding in a product term
bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation
4
effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions
mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50
main=Moderated mediation (not mean centered))
02 Psychometric functions are summarized in the second vignette
Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr
bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)
faparallel(myData)
vss(myData)
bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)
fa(myData)
iclust(myData)
omega(myData)
If you prefer to do a principal components analysis you may use the principalfunction The default is one component
principal(myData)
bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects
alpha(myData) score all of the items as part of one scale
myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))
myscores lt- scoreItems(myKeysmyData) form several scales
myscores show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette as
5
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix
bull Descriptively (just the values) (section 347)
r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals
bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included
corPlot(r) examine the many options for this function
bull Inferentially (the values the ns and the p values) (section 35)
corrtest(myData)
8 Apply various regression models
Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data
myData lt- satact
colnames(myData) lt- c(mod1med1x1x2y1y2)
setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)
bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip
mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)
We can also find the moderation effect by adding in a product term
bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation
4
effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions
mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50
main=Moderated mediation (not mean centered))
02 Psychometric functions are summarized in the second vignette
Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr
bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)
faparallel(myData)
vss(myData)
bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)
fa(myData)
iclust(myData)
omega(myData)
If you prefer to do a principal components analysis you may use the principalfunction The default is one component
principal(myData)
bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects
alpha(myData) score all of the items as part of one scale
myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))
myscores lt- scoreItems(myKeysmyData) form several scales
myscores show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette as
5
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions
mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50
main=Moderated mediation (not mean centered))
02 Psychometric functions are summarized in the second vignette
Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr
bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)
faparallel(myData)
vss(myData)
bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)
fa(myData)
iclust(myData)
omega(myData)
If you prefer to do a principal components analysis you may use the principalfunction The default is one component
principal(myData)
bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects
alpha(myData) score all of the items as part of one scale
myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))
myscores lt- scoreItems(myKeysmyData) form several scales
myscores show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette as
5
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
well as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page
6
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
1 Overview of this and related documents
The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook
Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses
Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca
functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa
Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti
Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available
For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation
multilevelreliability mlr will find various generalizability statistics for subjects over
7
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly
This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https
personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_
sempdf)
In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include
1 An introduction (vignette) of the psych package
2 An overview (vignette) of the psych package
3 Installing R and some useful packages
4 Using R and the psych package to find omegah and ωt
5 Using R and the psych for factor analysis and principal components analysis
6 Using the scoreItems function to find scale scores and scale statistics
7 Using mediate and setCor to do mediation moderation and regression analysis
For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project
orgrbook)
8
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
2 Getting started
Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary
The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages
installpackages(list(c(GPArotationmnormt)
Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view
installpackages(ctv)
library(ctv)
taskviews(Psychometrics)
Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language
3 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptivestatistics
Remember to run any of the psych functions it is necessary to make the package activeby using the library command
library(psych)
The other packages once installed will be called automatically by psych
It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg
First lt- function(x) library(psych)
9
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
31 Getting the data by using readfile
Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly
mydata lt- readfile()
If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command
mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row
and 36 fields the first of which is 4 colums the rest of which are 1 column each
If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)
To read SPSS files and to keep the value labels specify usevaluelabels=TRUE
this will keep the value labels for sav files
myspss lt- readfile(usevaluelabels=TRUE)
32 Data input from the clipboard
There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly
readclipboard is the base function for reading data from the clipboard
readclipboardcsv for reading text that is comma delimited
10
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)
readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix
readclipboardupper for reading input of an upper triangular matrix
readclipboardfwf for reading in fixed width fields (some very old data sets)
For example given a data set copied to the clipboard from a spreadsheet just enter thecommand
mydata lt- readclipboard()
This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function
gt mydata lt- readclipboard(sep=t) define the tab option or
gt mytabdata lt- readclipboardtab() just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)
gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))
33 Basic descriptive statistics
Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures
describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an
describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)
11
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt library(psych)
gt data(satact)
gt describe(satact) basic descriptive statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002
education 2 700 316 143 3 331 148 0 5 5 -068 -007 005
age 3 700 2559 950 22 2386 593 13 65 52 164 242 036
ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018
SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427
SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441
These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics
gt basic descriptive statistics by a grouping variable
gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)
Descriptive statistics by group
group 1
vars n mean sd se
gender 1 247 100 000 000
education 2 247 300 154 010
age 3 247 2586 974 062
ACT 4 247 2879 506 032
SATV 5 247 61511 11416 726
SATQ 6 245 63587 11602 741
---------------------------------------------------------------------------
group 2
vars n mean sd se
gender 1 453 200 000 000
education 2 453 326 135 006
age 3 453 2545 937 044
ACT 4 453 2842 469 022
SATV 5 453 61066 11231 528
SATQ 6 442 59600 11307 538
The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time
gt samat lt- describeBy(satactlist(satact$gendersatact$education)
+ skew=FALSEranges=FALSEmat=TRUE)
gt headTail(samat)
item group1 group2 vars n mean sd se
gender1 1 1 0 1 27 1 0 0
gender2 2 2 0 1 30 2 0 0
gender3 3 1 1 1 20 1 0 0
gender4 4 2 1 1 25 2 0 0
ltNAgt ltNAgt ltNAgt
SATQ9 69 1 4 6 51 6359 10412 1458
SATQ10 70 2 4 6 86 59759 10624 1146
SATQ11 71 1 5 6 46 65783 8961 1321
SATQ12 72 2 5 6 93 60672 10555 1095
12
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
331 Outlier detection using outlier
One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2
332 Basic data cleaning using scrub
If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function
Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)
gt x lt- matrix(1120ncol=10byrow=TRUE)
gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)
gt newx
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1] 1 2 NA NA NA 6 7 8 9 10
[2] 11 12 NA NA NA 16 17 18 19 20
[3] 21 22 NA NA NA 26 27 28 29 30
[4] 31 32 33 NA NA 36 37 38 39 40
[5] 41 42 43 44 NA 46 47 48 49 50
[6] 51 52 53 54 55 56 57 58 59 60
[7] 61 62 63 64 65 66 67 68 69 70
[8] 71 72 NA NA NA 76 77 78 79 80
[9] 81 82 NA NA NA 86 87 88 89 90
[10] 91 92 NA NA NA 96 97 98 99 100
[11] 101 102 NA NA NA 106 107 108 109 110
[12] 111 112 NA NA NA 116 117 118 119 120
Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis
333 Recoding categorical variables into dummy coded variables
Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which are
13
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png( outlierpng )
gt d2 lt- outlier(satactcex=8)
gt devoff()
null device
1
Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2
14
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
merely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots
Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric
34 Simple descriptive graphics
Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-
linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)
341 Scatter Plot Matrices
Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)
pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option
Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positive
15
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png( pairspanelspng )
gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe
gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)
gt devoff()
null device
1
Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22
16
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
and negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy
Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys
and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color
gt keys lt- makekeys(msq[175]list(
+ EA = c(active energetic vigorous wakeful wideawake fullofpep
+ lively -sleepy -tired -drowsy)
+ TA =c(intense jittery fearful tense clutchedup -quiet -still
+ -placid -calm -atrest)
+ PA =c(active excited strong inspired determined attentive
+ interested enthusiastic proud alert)
+ NAf =c(jittery nervous scared afraid guilty ashamed distressed
+ upset hostile irritable )) )
gt scores lt- scoreItems(keysmsq[175])
gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE
gt main =Density distributions of four measures of affect )
gt
gt devoff()
Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE
342 Density or violin plots
Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10
17
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21
+ main=Affect varies by movies )
gt devoff()
null device
1
Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions
18
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(violinpng)gt data(satact)
gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)
gt devoff()
null device
1
Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution
19
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
343 Means and error bars
Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses
errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval
errorbarsby does the same but grouping the data by some condition
errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =
radicpqN)
errorcrosses draw the confidence intervals for an x set and a y set of the same size
The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data
Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data
344 Error bars for tabular data
However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab
function
20
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt data(epibfi)
gt errorbarsby(epibfi[610]epibfi$epilielt4)
95 confidence limits
Independent Variable
Dep
ende
nt V
aria
ble
bfagree bfcon bfext bfneur bfopen
050
100
150
Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence
21
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt errorbarsby(satact[56]satact$genderbars=TRUE
+ labels=c(MaleFemale)ylab=SAT scorexlab=)
Male Female
95 confidence limits
SAT
sco
re
200
300
400
500
600
700
800
200
300
400
500
600
700
800
Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0
22
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt T lt- with(satacttable(gendereducation))
gt rownames(T) lt- c(MF)
gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education
+ main=Proportion of sample by education level)
Proportion of sample by education level
Level of Education
Pro
port
ion
of E
duca
tion
Leve
l
000
005
010
015
020
025
030
M 0 M 1 M 2 M 3 M 4 M 5
000
005
010
015
020
025
030
Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe
23
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
345 Two dimensional displays of means and errors
Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly
24
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt op lt- par(mfrow=c(12))
gt data(affect)
gt colors lt- c(blackredwhiteblue)
gt films lt- c(SadHorrorNeutralHappy)
gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films
+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16
+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect
+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)
gt op lt- par(mfrow=c(11))
8 12 16 20
1012
1416
1820
22
Movies effect on arousal
Energetic Arousal
Tens
e A
rous
al
SadHorror
NeutralHappy
6 8 10 12
24
68
10
Movies effect on affect
Positive Affect
Neg
ativ
e A
ffect
Sad
Horror
NeutralHappy
Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)
25
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
346 Back to back histograms
The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)
data(bfi)gt png( bibarspng )
gt bibars(bfiagegenderylab=Agemain=Age by males and females)
gt devoff()
null device
1
Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure
26
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(histopng)gt data(satact)
gt densityBy(bfiagegrp=gender)
gt devoff()
null device
1
Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges
27
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
347 Correlational structure
There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix lowerCor
calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix
gt lowerCor(satact)
gendr edctn age ACT SATV SATQ
gender 100
education 009 100
age -002 055 100
ACT -004 015 011 100
SATV -002 005 -004 056 100
SATQ -017 003 -003 059 064 100
When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this
gt female lt- subset(satactsatact$gender==2)
gt male lt- subset(satactsatact$gender==1)
gt lower lt- lowerCor(male[-1])
edctn age ACT SATV SATQ
education 100
age 061 100
ACT 016 015 100
SATV 002 -006 061 100
SATQ 008 004 060 068 100
gt upper lt- lowerCor(female[-1])
edctn age ACT SATV SATQ
education 100
age 052 100
ACT 016 008 100
SATV 007 -003 053 100
SATQ 003 -009 058 063 100
gt both lt- lowerUpper(lowerupper)
gt round(both2)
education age ACT SATV SATQ
education NA 052 016 007 003
age 061 NA 008 -003 -009
ACT 016 015 NA 053 058
SATV 002 -006 061 NA 063
SATQ 008 004 060 068 NA
It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal
28
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt diffs lt- lowerUpper(lowerupperdiff=TRUE)
gt round(diffs2)
education age ACT SATV SATQ
education NA 009 000 -005 005
age 061 NA 007 -003 013
ACT 016 015 NA 008 002
SATV 002 -006 061 NA 005
SATQ 008 004 060 068 NA
348 Heatmap displays of correlational structure
Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown
Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)
35 Testing correlations
Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr
function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)
Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)
29
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)
gt devoff()
null device
1
Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure
30
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(circplotpng)gt circ lt- simcirc(24)
gt rcirc lt- cor(circ)
gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()
null device
1
Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix
31
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt png(spiderpng)gt oplt- par(mfrow=c(22))
gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)
gt op lt- par(mfrow=c(11))
gt devoff()
null device
1
Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect
32
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)
Callcorrtest(x = satact)
Correlation matrix
gender education age ACT SATV SATQ
gender 100 009 -002 -004 -002 -017
education 009 100 055 015 005 003
age -002 055 100 011 -004 -003
ACT -004 015 011 100 056 059
SATV -002 005 -004 056 100 064
SATQ -017 003 -003 059 064 100
Sample Size
gender education age ACT SATV SATQ
gender 700 700 700 700 700 687
education 700 700 700 700 700 687
age 700 700 700 700 700 687
ACT 700 700 700 700 700 687
SATV 700 700 700 700 700 687
SATQ 687 687 687 687 687 687
Probability values (Entries above the diagonal are adjusted for multiple tests)
gender education age ACT SATV SATQ
gender 000 017 100 100 1 0
education 002 000 000 000 1 1
age 058 000 000 003 1 1
ACT 033 000 000 000 0 0
SATV 062 022 026 000 0 0
SATQ 000 036 037 000 0 0
To see confidence intervals of the correlations print with the short=FALSE option
33
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
depending upon the input
1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval
gt rtest(503)
Correlation tests
Callrtest(n = 50 r12 = 03)
Test of significance of a correlation
t value 218 with probability lt 0034
and confidence interval 002 053
2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores
gt rtest(3046)
Correlation tests
Callrtest(n = 30 r12 = 04 r34 = 06)
Test of difference between two independent correlations
z value 099 with probability 032
3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)
gt rtest(103451)
Correlation tests
Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )
Test of difference between two correlated correlations
t value -089 with probability lt 037
4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)
gt rtest(103567558) steiger Case B
Correlation tests
Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05
r24 = 08)
Test of difference between two dependent correlations
z value -12 with probability 023
To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero
gt cortest(satact)
34
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Tests of correlation matrices
Callcortest(R1 = satact)
Chi Square value 132542 with df = 15 with probability lt 18e-273
36 Polychoric tetrachoric polyserial and biserial correlations
The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation
The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by
drawcor(expand=20cuts=c(00))
Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation
If the data are a mix of continuous polytomous and dichotomous variables the mixedcor
function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations
The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem
35
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt drawtetra()
minus3 minus2 minus1 0 1 2 3
minus3
minus2
minus1
01
23
Y rho = 05phi = 028
X gt τY gt Τ
X lt τY gt Τ
X gt τY lt Τ
X lt τY lt Τ
x
dnor
m(x
)
X gt τ
τ
x1
Y gt Τ
Τ
Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values
36
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
4 Multilevel modeling
Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups
41 Decomposing data into within and between level correlations usingstatsBy
There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)
This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package
rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)
where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means
42 Generating and displaying multilevel data
withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a between
37
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
group correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data
simmultilevel will generate simulated data with a multilevel structure
The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary
function specifying the variable of interest
Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )
43 Factor analysis by groups
Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output
sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)
faBy(sbnfactors=5) find the 5 factor solution for each education level
5 Multiple Regression mediation moderation and set cor-relations
The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices
51 Multiple regression from data or correlation matrices
The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstone
38
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
correlation matrix and find the multiple correlation of the last five variables as a functionof the first 4
gt setCor(y = 59x=14data=Thurstone)
Call setCor(y = 59 x = 14 data = Thurstone)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
Sentences 009 369
Vocabulary 009 388
SentCompletion 002 300
FirstLetters 058 135
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 069 048 059 034
DV = Suffixes slope VIF
Sentences 007 369
Vocabulary 017 388
SentCompletion 005 300
FirstLetters 045 135
Multiple Regression
R R2 Ruw R2uw
Suffixes 063 04 058 034
DV = LetterSeries slope VIF
Sentences 025 369
Vocabulary 009 388
SentCompletion 004 300
FirstLetters 021 135
Multiple Regression
R R2 Ruw R2uw
LetterSeries 05 025 049 024
DV = Pedigrees slope VIF
Sentences 021 369
Vocabulary 016 388
SentCompletion 021 300
FirstLetters 008 135
Multiple Regression
R R2 Ruw R2uw
Pedigrees 058 034 058 033
DV = LetterGroup slope VIF
Sentences 020 369
Vocabulary -002 388
SentCompletion 008 300
FirstLetters 031 135
Multiple Regression
R R2 Ruw R2uw
LetterGroup 048 023 045 02
39
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Various estimates of between set correlations
Squared Canonical Correlations
[1] 06280 01478 00076 00049
Average squared canonical correlation = 02
Cohens Set Correlation R2 = 069
Unweighted correlation between the two sets = 073
By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed
gt sc lt- setCor(y = 59x=34data=Thurstonez=12)
Call setCor(y = 59 x = 34 data = Thurstone z = 12)
Multiple Regression from matrix input
DV = FourLetterWords slope VIF
SentCompletion 002 102
FirstLetters 058 102
Multiple Regression
R R2 Ruw R2uw
FourLetterWords 058 033 044 019
DV = Suffixes slope VIF
SentCompletion 005 102
FirstLetters 045 102
Multiple Regression
R R2 Ruw R2uw
Suffixes 046 021 035 012
DV = LetterSeries slope VIF
SentCompletion 004 102
FirstLetters 021 102
Multiple Regression
R R2 Ruw R2uw
LetterSeries 021 004 017 003
DV = Pedigrees slope VIF
SentCompletion 021 102
FirstLetters 008 102
Multiple Regression
R R2 Ruw R2uw
Pedigrees 018 003 014 002
DV = LetterGroup slope VIF
SentCompletion 008 102
FirstLetters 031 102
40
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Multiple Regression
R R2 Ruw R2uw
LetterGroup 03 009 026 007
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0405 0023
Average squared canonical correlation = 021
Cohens Set Correlation R2 = 042
Unweighted correlation between the two sets = 048
gt round(sc$residual2)
FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup
FourLetterWords 052 011 009 006 013
Suffixes 011 060 -001 001 003
LetterSeries 009 -001 075 028 037
Pedigrees 006 001 028 066 020
LetterGroup 013 003 037 020 077
52 Mediation and Moderation analysis
Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial
Consider the example from Preacher and Hayes (2004) as analyzed using the mediate
function and the subsequent graphic from mediatediagram The data are found in theexample for mediate
MediationModeration Analysis
Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)
The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB
Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 27 with p = 0019
Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019
Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033
Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07
R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 00067
To see the longer output specify short = FALSE in the print statement or ask for the summary
bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables
41
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt mediatediagram(preacher)
Mediation model
THERAPY SATIS
ATTRIB
082
c = 076
c = 043
04
Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram
42
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)
gt setCordiagram(preacher)
Regression Models
THERAPY
ATTRIB
SATIS
043
04
021
Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure
43
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
setCor(SATV + SATQ ~ education + age data = satact std=TRUE)
bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap
mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)
bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis
53 Set Correlation
An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor
function Set correlation is
R2 = 1minusn
prodi=1
(1minusλi)
where λi is the ith eigen value of the eigen value decomposition of the matrix
R = Rminus1xx RxyRminus1
xx Rminus1xy
Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate
setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data
Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things to
44
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
MediationModeration Analysis
Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia
niter = 50 main = Moderated mediation (mean centered))
The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism
Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 125 with p = 45e-10
Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10
Indirect effect (ab) of prot2 on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 002 Lower CI = -001 Upper CI = 007
Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 125 with p = 00045
Direct effect (c) of prot2sexism on NA removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048
Indirect effect (ab) of prot2sexism on respappr through sexism = 0
Mean bootstrapped indirect effect = 001 with standard error = 004 Lower CI = -005 Upper CI = 013
R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 146e-09
To see the longer output specify short = FALSE in the print statement or ask for the summary
Moderated mediation (mean centered)
prot2
prot2sexism
respapprsexism
007c = 146
c = 146
009c = 081
c = 081
002minus001
Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term
45
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
notice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied
For this example the analysis is done on the correlation matrix rather than the rawdata
gt C lt- cov(satactuse=pairwise)
gt model1 lt- lm(ACT~ gender + education + age data=satact)
gt summary(model1)
Call
lm(formula = ACT ~ gender + education + age data = satact)
Residuals
Min 1Q Median 3Q Max
-252458 -32133 07769 35921 92630
Coefficients
Estimate Std Error t value Pr(gt|t|)
(Intercept) 2741706 082140 33378 lt 2e-16
gender -048606 037984 -1280 020110
education 047890 015235 3143 000174
age 001623 002278 0712 047650
---
Signif codes 0 0001 001 005 01 1
Residual standard error 4768 on 696 degrees of freedom
Multiple R-squared 00272 Adjusted R-squared 002301
F-statistic 6487 on 3 and 696 DF p-value 00002476
Compare this with the output from setCor
gt compare with sector
gt setCor(c(46)c(13)C nobs=700)
Call setCor(y = c(46) x = c(13) data = C nobs = 700)
Multiple Regression from matrix input
DV = ACT slope se t p lowerci upperci VIF
gender -005 018 -027 079 -040 031 101
education 014 022 065 051 -028 057 145
age 003 022 015 088 -039 046 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
ACT 016 003 015 002 002 001 649 3 696 0000248
DV = SATV slope se t p lowerci upperci VIF
gender -003 429 -001 099 -845 839 101
education 010 513 002 098 -997 1017 145
age -010 511 -002 098 -1013 993 144
Multiple Regression
46
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATV 01 001 008 001 001 001 226 3 696 00808
DV = SATQ slope se t p lowerci upperci VIF
gender -018 434 -004 097 -869 834 101
education 010 518 002 098 -1007 1028 145
age -009 516 -002 099 -1023 1005 144
Multiple Regression
R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
SATQ 019 004 018 003 003 001 863 3 696 124e-05
Various estimates of between set correlations
Squared Canonical Correlations
[1] 0050 0033 0008
Chisq of canonical correlations
[1] 358 231 56
Average squared canonical correlation = 003
Cohens Set Correlation R2 = 009
Shrunken Set Correlation R2 = 008
F and df of Cohens Set Correlation 726 9 168186
Unweighted correlation between the two sets = 014
Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship
6 Converting output to APA style tables using LATEX
Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX
An example of converting the output from fa to LATEXappears in Table 2
47
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Table 2 fa2latexA factor analysis table from the psych package in R
Variable MR1 MR2 MR3 h2 u2 com
Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123
SS loadings 264 186 15
MR1 100 059 054MR2 059 100 052MR3 054 052 100
7 Miscellaneous functions
A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions
blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design
df2latex is useful for taking tabular output (such as a correlation matrix or that of de-
scribe and converting it to a LATEX table May be used when Sweave is not conve-nient
cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex
cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-
dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data
fisherz Convert a correlation to the corresponding Fisher z score
48
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data
ICC and cohenkappa are typically used to find the reliability for raters
headtail combines the head and tail functions to show the first and last lines of a dataset or output
topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between
mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe
prep finds the probability of replication for an F t or r and estimate effect size
partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)
rangeCorrection will correct correlations for restriction of range
reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages
superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems
8 Data sets
A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton
49
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests
bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses
spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory
satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis
epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays
iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics
ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)
galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas
Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variables
50
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
This data set includes his example
miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques
9 Development version and a users guide
The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr
srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo
gt installpackages(psych repos=httpspersonality-projectorgr type=source)
Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf
News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news
gt news(Version gt= 184package=psych)
10 Psychometric Theory
The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book
For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R
51
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
11 SessionInfo
This document was prepared using the following settings
gt sessionInfo()
R Under development (unstable) (2018-04-05 r74542)
Platform x86_64-apple-darwin1560 (64-bit)
Running under macOS 1014
Matrix products default
BLAS LibraryFrameworksRframeworkVersions36ResourcesliblibRblas0dylib
LAPACK LibraryFrameworksRframeworkVersions36ResourcesliblibRlapackdylib
locale
[1] C
attached base packages
[1] stats graphics grDevices utils datasets methods base
other attached packages
[1] psych_1810
loaded via a namespace (and not attached)
[1] compiler_360 parallel_360 tools_360 foreign_08-70 nlme_31-136 mnormt_15-5
[7] grid_360 lattice_020-35
52
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
References
Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432
Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458
Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY
Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package
Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276
Cattell R B (1978) The scientific use of factor analysis Plenum Press New York
Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)
Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition
Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64
Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98
Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334
Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178
Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland
Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models
Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance
53
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745
Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450
Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition
Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282
Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA
Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York
Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132
Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70
Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54
Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185
Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300
Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4
Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258
Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317
MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ
Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686
54
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ
Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495
Nunnally J C (1967) Psychometric theory McGraw-Hill New York
Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo
3rd edition
Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers
Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731
Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74
Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186
Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer
Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)
Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell
Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc
Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414
Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of
55
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer
Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154
Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90
Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428
Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326
Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco
Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco
Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101
Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251
Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824
Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353
Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill
Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454
Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan
Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327
Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133
56
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144
57
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
Index
ability 49affect 15 25alpha 5 7alpha factoring 7
Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 15 35blockrandom 48burt 35
char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49
densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9
drawcor 35drawtetra 35dummycode 15dynamite plot 20
edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24
fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factor analysis 7factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48
galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7
harmonicmean 49head 49headtail 49heights 49
58
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
hetdiagram 8Hmisc 29Holzinger 49
ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-
plications in R 8iqitems 49irtfa 7 47irt2latex 47 48
KnitR 47
lavaan 38library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15
makekeys 17MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42 44mediatediagram 41minimum residual 7mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevel 37multilevelreliability 7multiple regression 38
nfactors 7
nlme 37
omega 7 9outlier 3 13 14
padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51
R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 35cortest 29cor2latex 47 48corPlot 8
59
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49hetdiagram 8
Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51psych package
60
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
ability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 15 35blockrandom 48burt 35char2numeric 15circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 35cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 15epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 9 47fadiagram 8
faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 38factorminres 9factorpa 9factorwls 9fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 17MAP 7mardia 49mediate 4 8 41 42 44mediatediagram 41mixedcor 35mlArrange 8mlPlot 8mlr 7msq 17multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 13 14
61
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 35principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49
tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 44scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49table 20
62
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63
tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 17 19vss 5 7withinBetween 37 38
R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash9 29 47ndash49 51Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47
rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 11readclipboardtab 3 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9
SAPA 26 50satact 11 34 44scatterhist 7
schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 17scrub 3 13sector 43sem 8 49set correlation 44setcor 49setCor 4 8 38 41 44 46 47simmultilevel 38spi 49spider 15stars 15stats 29StatsBy 7statsBy 7 37 38statsByboot 38statsBybootsummary 38structurediagram 8superMatrix 49Sweave 47
table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49
vegetables 49 51violinBy 15 17 19vss 5 7
weighted least squares 7withinBetween 37 38
xtable 47
63