Martens, 2005

7/31/2019 Martens, 2005

1/31

http://tcp.sagepub.com/The Counseling Psychologist

http://tcp.sagepub.com/content/33/3/269The online version of this article can be found at:

DOI: 10.1177/00110000042722602005 33: 269The Counseling Psychologist

Matthew P. MartensThe Use of Structural Equation Modeling in Counseling Psychology Research

Published by:

http://www.sagepublications.com

On behalf of:

Division of Counseling Psychology of the American Psychological Association

can be found at:The Counseling PsychologistAdditional services and information for

http://tcp.sagepub.com/cgi/alertsEmail Alerts:

http://tcp.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://tcp.sagepub.com/content/33/3/269.refs.htmlCitations:

What is This?

- Apr 5, 2005Version of Record>>

at Jazan University on August 27, 2012tcp.sagepub.comDownloaded from
http://tcp.sagepub.com/http://tcp.sagepub.com/http://tcp.sagepub.com/http://tcp.sagepub.com/content/33/3/269http://tcp.sagepub.com/content/33/3/269http://tcp.sagepub.com/content/33/3/269http://www.sagepublications.com/http://www.div17.org/http://tcp.sagepub.com/cgi/alertshttp://tcp.sagepub.com/cgi/alertshttp://tcp.sagepub.com/subscriptionshttp://tcp.sagepub.com/subscriptionshttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsPermissions.navhttp://tcp.sagepub.com/content/33/3/269.refs.htmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://tcp.sagepub.com/content/33/3/269.full.pdfhttp://tcp.sagepub.com/content/33/3/269.full.pdfhttp://tcp.sagepub.com/http://tcp.sagepub.com/http://tcp.sagepub.com/http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://tcp.sagepub.com/content/33/3/269.full.pdfhttp://tcp.sagepub.com/content/33/3/269.refs.htmlhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsReprints.navhttp://tcp.sagepub.com/subscriptionshttp://tcp.sagepub.com/cgi/alertshttp://www.div17.org/http://www.sagepublications.com/http://tcp.sagepub.com/content/33/3/269http://tcp.sagepub.com/

7/31/2019 Martens, 2005

2/31

10.1177/0011000004272260THE COUNSELING PSYCHOLOGIST / May 2005Martens / SEM IN COUNSELING PSYCHOLOGYThe Use of Structural Equation Modeling in

Counseling Psychology Research

Matthew P. MartensUniversity at Albany, State University of New York

Structural equation modeling (SEM) has become increasingly popular for analyzing

data in the social sciences, although several broad reviews of psychology journals sug-

gest that many SEM researchers engage in questionable practices when using the tech-

nique. The purpose of this study is to review and critique the use of SEM in counseling

psychologyresearchregardingseveral of these questionablepractices. One hundredfive

studies from 99 separate articles published in the Journal of Counseling Psychology

between1987and2003werereviewed.Results of thereview indicate that many counsel-

ing psychology studies do not engage in various best practices recommended by SEM

experts (e.g., testingmultiple a priori theoreticalmodels or reportingall parameteresti-

mates or effect sizes). Results also indicate that SEMpractices in counseling psychology

seem to be improving in some areas, whereasin other areasno improvements were noted

over time. Implications of these results are discussed, and suggestions for SEM use

within counseling psychology are provided.

Structural equation modeling (SEM) is a techniqueforanalyzingdata that

is designed to assess relationships among both manifest (i.e., directly mea-

sured or observed) and latent (i.e., theunderlying theoretical construct) vari-

ables. When using statistical techniques such as multiple regression or

ANOVA, the researcher only conducts his or her analysis on variables that

are directly measured, which can be somewhat limiting when the individual

is interested in testing underlying theoretical constructs. For example, in an

ANOVA design, a researcher interested in studying the construct of depres-sionmight include oneself-report depression scaleas thedependentvariable.

The researcher may interpret that scale as representative of the entire con-

struct of depression, a dubious conclusion given the complexity of depres-

sion. In contrast, a researcher using SEM could explicitly model the latent

construct of depression rather than relying on one variable as a proxy for the

construct. SEM also provides advantages over other data analytic techniques

in that complex theoretical models can be examined in one analysis.1

269

I thank Richard Haase, TiffanySanford,and SamuelZizzi fortheirwork on earlier draftsof this

article and Kirsten Corbett, Amanda Ferrier, Melissa Sheehy, and Xuelin Weng for their help in

coding thedata. A previousversionof thisarticlewas presentedat the2003annualmeetingofthe

American Psychological Association. Correspondence concerning this article should be

addressed to Matthew P. Martens, Departmentof Educational and Counseling Psychology, Uni-

versity at Albany, State University of New York, ED220, 1400 Washington Ave, Albany, NewYork 12222; phone: (518) 442-5039; e-mail: mmartens@uamail.albany.edu.

THE COUNSELING PSYCHOLOGIST, Vol. 33 No. 3, May 2005 269-298

DOI: 10.1177/0011000004272260

2005 by the Society of Counseling Psychology

http://tcp.sagepub.com/http://tcp.sagepub.com/http://tcp.sagepub.com/http://tcp.sagepub.com/

7/31/2019 Martens, 2005

3/31

A hypothetical example of a structural equation model that illustrates

some advantages of SEM is presented in Figure 1.2 This model includes five

latent constructs that are represented by ovals: personality characteristics

thought to be associated with alcohol use (personality), familial factors

thought to be related to alcohol use (family risk), motivations for using

alcohol (drinking motives), strategies that canbe used to limit alcohol con-

sumption and problems related to alcohol use (protective behaviors), and

problems associated with alcohol consumption (alcohol problems). Each

latent variable includes several measured indicator variables, represented by

rectangles, that are thought to represent components of the underlying vari-

able. Therefore, one can see how the researcher can explicitly model the

underlying constructs of interest via SEM by directly incorporating the

constructs into the model that is to be tested.Figure 1 also demonstrates a relatively complex series of relationships

thatexplainor predict problems associated withalcohol consumption, which

would then be testedin a singleanalysis. In this model,both personalitychar-

acteristics and family risk factors are thought to predict or cause motivation

fordrinking anduseof protectivebehaviors, which arethen in turn thought to

predict or cause alcohol-related problems. These causal paths are indicated

by single-headed arrows between the variables in question (note that such

pathsexist between eachlatent constructand itsobservedindicatorvariables,

which occurs because the latent construct is thought to cause whatever

responses occur in the observed variables that represent the construct). Per-

sonality characteristics and family risk factors are conceptualized as being

correlated, but no causal or predictive relationship is specified. Therefore, a

double-headed curved arrow indicates the relationship between these twoconstructs, which represents covariance between variables.

As Figure 1 illustrates, SEM is well suited for model testing because the

researcher can specify causal models that correspond to a theoretical per -

spective. Through SEM the researcher can then test the plausibility of the

modelson observed data. SEMhasnumerousapplicationswithin counseling

psychology, as research in the field often involves testing or validating theo-

retical models. For example, SEM is appropriate in scale development

research to confirm the factor structure of an instrument. A researcher may

wish to test a hypothesized factor structure of an existing instrument with a

new population or may have established a tentative factor structure of a new

instrument (perhaps viaexploratory factor analysis)and wish to confirm this

factor structureon an independent sample. Counselingpsychologyresearch-

ers arealso often interested in testing complex theoretical models in relevant

areas (e.g., career development and multicultural development models),

which can be accomplished effectively via SEM.

270 THE COUNSELING PSYCHOLOGIST / May 2005


7/31/2019 Martens, 2005

4/31

Perhaps because of the rapid expansion in SEM software in recent years,

SEM is a popular technique for analyzing data in the social sciences (see

Steiger, 2001). Unfortunately, thisexpansionin popularity coincideswiththe

expression of many concerns in the SEM literature regarding practices of

psychological researchers. Recent reviews of SEM research (MacCallum &

Austin, 2000; McDonald & Ho, 2002) among various psychology journals

reported many questionable practices related to the use of SEM at all stages

of research, includingconceptualization (e.g.,not includingand testingplau-sible alternative models), execution (e.g., modifying or generating models

based on empirical rather than theoretical criteria), and interpretation (e.g.,

not reporting all parameter estimates within a model).

Martens / SEM IN COUNSELING PSYCHOLOGY 271

Personality

SocialAnxiety

Neuroticism

SensationSeeking

Impulsivity

ProtectiveBehaviors

Peer SupportStoppingDrinking

Type ofDrinking

DrinkingMotives

TensionReduction

SocialEnjoyment

PleasantFeelings

FamilyRisk

FamilyConnectedness

Age AtFirst Drink

PaternalAlcoholism

MaternalAlcoholism

AlcoholProblems

Binge Drinking

Drinks PerWeek

SocialProblems

PersonalProblems

FIGURE 1. Structural Equation Model Predicting Alcohol Problems


7/31/2019 Martens, 2005

5/31

Studies from the Journal of Counseling Psychology were included in

thesepreviousreviews, butbecause findingswerenot categorized by journal,

the extent to which the concerns applied specifically to counseling psychol-

ogy research was impossible to determine. Furthermore, because these

reviews cover a fairly limited time (1993 to 1997 for MacCallum & Austin;

1995 to 1997 for McDonald & Ho), the generalizability is questionable.

Finally, these reviews were primarily narrative rather than empirical. A

broad, empirical review andcritique of SEM practices specific to counseling

psychology could therefore serve several purposes. First, an empirical rather

than narrative review lets findings be presented in a statistical format, which

allows readers to generate their own conclusions from the findings. Second,

an empirical review can provide counseling psychologists with some gauge

of the quality of SEM research that has been published within the field.Besides the scientific importance of evaluating the methodology that was

used in a portion of counseling psychology research, a practical consider-

ation emerges when one realizes that counseling psychologists often use

SEM to developandrefinepsychological assessments. If, forexample,a pat-

tern of misusing or misinterpreting SEM exists within counseling psychol-

ogy, then individuals in the fieldmight need to reexamine instruments devel-

opedvia theseprocedures before feelingconfidentregarding theiruse. Third,

a reviewover a reasonably long time (e.g., at least 15 years) would allow one

to determine if practices related to the use of SEM have improved over time.

Finally, an empirical review can educate researchers, journal reviewers, and

journal editors by highlighting salient concerns about the use of SEM within

counseling psychology.

Some specific concerns related to the use of SEM in psychologicalresearch thathave beenhighlighted include lackof identification of plausible

alternative models, failure to assess for multivariate normality before con-

ducting SEM analysis, failure to assess the fit of the path model separately

from the measurement model, failure to provide a full report of parameter

estimates, and either generation or modification of models on the basis of

empirical, rather than theoretical, criteria (Breckler, 1990; MacCallum &

Austin, 2000; McDonald & Ho, 2002). Additionally, researchers are con-

cerned about the use of certain fit indices in assessing how well the theoreti-

cal model fits the data (e.g., Hu & Bentler, 1998, 1999). Each of these issues

is addressed below.

Identifying Plausible Alternative Models

According to McDonald and Ho (2002), multiple models that might

explain the data are found in most multivariate data sets.3 Thus, a researcher



7/31/2019 Martens, 2005

6/31

testing only onemodel mayidentifya well-fittingmodelbut maybe ignoring

other plausible models that better account for the relationships among the

data (or at least account for the relationships as well as the initial model). By

testing alternativea priori models (i.e., theresearcher specifiesmultiple mod-

els to be tested before conducting the analyses), even when a target model is

clearly of greatest interest, researchers can protect themselves against a con-

firmation bias that can occur when only testing one model (MacCallum &

Austin, 2000). For example, Figure 2 illustrates an alternative, yet theoreti-

cally plausible, model to that depicted in Figure1. Note that twocausalpaths

have been added: one between personality and alcohol problems and one


Personality

SocialAnxiety

Neuroticism

SensationSeeking

Impulsivity

ProtectiveBehaviors

Peer SupportStoppingDrinking

Type ofDrinking

DrinkingMotives

TensionReuction

SocialEnjoyment

PleasantFeelings

FamilyRisk

FamilyConnectedness

Age AtFirst Drink

PaternalAlcoholism

MaternalAlcoholism

AlcoholProblems

Binge Drinking

Drinks PerWeek

SocialProblems

PersonalProblems

FIGURE 2. Alternative Model Predicting Alcohol Problems, With Additional Paths

Included


7/31/2019 Martens, 2005

7/31

between family riskandalcohol problems.Essentially, thesepaths are testing

whether a direct relationship exists between personality/familial risk factors

and alcohol problems as well as an indirect relationship, which is thought to

occur, exists via drinking motives and protective behaviors.

By testing this model along with themodel depicted in Figure1, research-

ers couldmakeconclusions aboutmodelfit between two theoreticallyplausi-

bleperspectives and thus be less likely to engage in confirmation bias. A sec-

ond advantage of testing alternative models is that when one model is nested

within another, direct comparisons can be conducted to determine if one

model provides a significantly better fit than the other model. Models are

considered nested when the model with the smaller number of estimated

parameters canbe obtained by fixing thevalues of oneor more parameters of

the larger model (Bollen, 1989b). For example, one could obtain the modeldepicted in Figure 1 by constraining the values of the direct paths between

personality/familial risk factors and alcohol problems in Figure 2 to zero.

Because these models are nested, one could, via the c2 difference test, deter-

mine if the more complex model (i.e., the model in Figure 2) provides a sig-

nificantly better fit to thedata.4 Additionally, testing multiplea priori models

provides the researcher with alternatives should problems be found with the

initial target model, without relying on post hoc empirically derived model

modifications (MacCallum & Austin, 2000). Issues related to empirically

derived model modifications are discussed later.

Assessing for Multivariate Normality

The most common estimation method in SEM research, maximum likeli-hood, requires an assumption of multivariate normality (Bollen, 1989b;

McDonald & Ho, 2002; Quintana & Maxwell, 1999). Essentially, maximum

likelihood estimation procedures provide parameter estimates that are most

likely (hence thename) to represent thepopulation values, assuming that the

sample represents the population from which it was drawn. If SEM is used

with data that do notsatisfy this requirement, then issues such as biasedstan-

dard errors, inaccurate test statistics, and inflated Type I error rates can

emerge (Chou, Bentler, & Satorra, 1991; Powell & Shafer, 2001; West,

Finch, & Curran, 1995). Although the maximum likelihood method may be

somewhat robust against this violation, especially with smaller deviations

from normality (Amemiya & Anderson, 1990; Browne & Shapiro, 1988;

Chou et al., 1991; McDonald & Ho, 2002), it seems prudent that SEM

researchers at least note potential issues, concerns, or alternative analytic

strategies (e.g., alternative estimation procedures, data transformations, and

bootstrapping) related to multivariate normality.



7/31/2019 Martens, 2005

8/31

How Well a Model Fits: The Use of Fit Indices

When using SEM, a major component of the analysis involves evaluating

how the hypothesized model fits the observed data. To assess this fit,

researchers generally use various goodness-of-fit measures. The most com-

mon measure is the probability of the c2 statistic, which assesses the magni-

tude of the discrepancy between the fitted (model) and sample (observed)

covariance matrix and represents the most stringent exact fit measure. The

null hypothesis for this analysis is that no difference exists between the fitted

and sample matrices, so a nonsignificant c2 indicates that the model accu-

ratelyrepresents thedata (assuminga true model). However, thepowerof the

c2 and the c2 difference test when comparing models, like that of all inferen-

tial tests, is influenced by sample size. Therefore, when samples are large,

small differences between the fitted and sample covariance matrices (which

would indicate a relatively good fit) may yield a statistically significant c2

(see Bentler & Bonett, 1980; Gerbing & Anderson, 1993; Marsh, Balla, &

McDonald, 1988). Furthermore, since SEM analyses typically require fairly

large sample sizes, many otherwise well-fitting models may nonetheless

yield a statistically significant c2.5

To deal with this problem, researchers generally use additional measures

of fit, butconsiderable debateexists regarding which fit indices areappropri-

ate (e.g., Bentler, 1990; Bollen, 1990; Gerbing & Anderson, 1993; McDon-

ald& Marsh,1990). Several studies have found that some commonly used fit

indices, such as the goodness-of-fit index (GFI; Jreskog & Srbom, 1981),

adjusted goodness-of-fit index (AGFI; Bentler, 1983; Jreskog & Srbom,

1981; Tanaka& Huba, 1985),c2

/dfratio, and normed fit index (NFI; Bentler& Bonett, 1980), were substantially affected by factors extrinsic to actual

model misspecification (e.g., sample size and number of indicators per fac-

tor) and didnot generalize well across samples (Anderson & Gerbing, 1984;

Hu & Bentler, 1998; Marsh et al., 1988).

In contrast, fit indices such as the Tucker-Lewis index (or non-normed fit

index; TLI; Bentler & Bonett, 1980; Tucker & Lewis, 1973), incremental fit

index (IFI; Bollen, 1989a), comparative fit index (CFI; Bentler, 1990), root

mean square error of approximation (RMSEA; Steiger & Lind, 1980), and

standardized root mean square residual (SRMR; Bentler, 1995) were much

less affectedby factors other than model misspecification and tended to gen-

eralize relatively well. Based on these and other findings regarding

misspecified models, some SEM experts have recommended against the use

of the GFI, AGFI, c2/dfratio, and NFI, while supporting the use of the TLI,IFI, CFI, RMSEA, and SRMR (e.g., Hu & Bentler, 1998, 1999; Steiger,

2000). Although these recommendationsarenot theonly opinion andshould



7/31/2019 Martens, 2005

9/31

not necessarily be considered the so-calledgoldstandard, the researchunder-

lying these recommendations is some of the most comprehensive and com-

pelling on the topic. Thus, these recommendations were followed for the

purposes of this article.

Assessing the Fit of the Path Model

When analyzing a structural equation model that posits causal relations

among latent variables, the researcher is typicallymost interested in the path

portion of the structural model (i.e., the relationships among the latent vari-

ables), as opposed to themeasurementportion of themodel(i.e.,the manifest

indicators of each latentvariable). In theexamples provided in Figures 1 and

2, the path portion of the model would refer to the causal paths among thelatent variables of personality, family risk, drinking motives, protective

behaviors, and alcohol problems, while the measurement portion of the

model would refer to the paths from each latent variable to its observed

indicator variables.

When most SEMresearchers report thefit of their model, they only report

the fit of the full structural model (including both the measurement and path

components of themodel) or firstreport thefit of themeasurement model and

then the fit of the full structural model. In their review, however, McDonald

and Ho (2002) identified 14 studies where the fit of the path model itself

could be obtained separately from the fit of themeasurement model (the dis-

crepancy function and degrees of freedom canbe divided into separate addi-

tive components for both the measurement and path model; see Steiger,

Shapiro, & Browne, 1985). In most of these studies, the fit of the path modelitself was poor, even though the fit of the full structural model was generally

good. The authors concluded that in many cases the goodness-of-fit of a full

structural model conceals the badness-of-fit of the actual path model (which

is generallyof most interest to theresearcher), which generallyresults from a

particularly well-fitting measurement model. In these cases the researchers

might conclude that their overall model demonstrates a good fit, when in fact

the relationships between the latent variables in their model would be weak.

Therefore, they recommended that researchers report the fit of the measure-

ment and path portions of the model separately.

Reporting All Parameter Estimates/Effect Sizes

Another concern about SEM research is incomplete reporting of all

parameter estimates, in particular the error or disturbance variances associ-

ated with endogenous (outcome) variables (Hoyle & Panter, 1995;

MacCallum & Austin, 2000; McDonald & Ho, 2002). Among other consid-



7/31/2019 Martens, 2005

10/31

erations, reporting all parameterestimates (including error variances) allows

readers to consider the relationships among the variables in the structural

model and the variance explained by the exogenous (predictor) variables

with theendogenous variables, ratherthan simplythe fit of theoverallmodel.

Alternatively, researchers couldsimply provide theR2 values for theendoge-

nous variables in their model. In Figures 1 and 2, providing the R2 values for

drinking motives,protective behaviors, and particularlyalcohol abuse would

be useful. Prior reviews have indicated that only about 50% of published

SEM studies reported parameterestimatesof errorand disturbance variances

or other measures of effect size (MacCallum & Austin, 2000; McDonald &

Ho, 2002).

SEM and Model Modification

Themodelmodification strategy refersto thepracticeof modifyingan ini-

tial model,generallyby empiricalcriteria, until it fits thedata (MacCallum &

Austin, 2000).SEMmodels that initiallydisplay a poor fit canbeeasilymod-

ified to improve fit by adding parameters that will decrease thec2 value (i.e.,

using modification indices), by simply deleting nonsignificant parameters,

or by parceling individual items into groups that are then used as manifest

variables. Although the practice of parceling items can sometimes be war-

ranted (seeLittle, Cunningham, Shahar, & Widaman, 2002), parceling items

post hoc primarily to improve fit is best considered a model modification

strategy.

An example of post hoc model modification would occur if a researcher

tested the model displayed in Figure 2, found that the path between person-ality and protective behaviors was nonsignificant, and then deleted the

path and reran the analysis. Another example would be if the researcher

learned that correlating theerrorterms(which arenot shown in thefigures) of

the observed variables of impulsivity and sensation seeking would improve

model fit, added this parameter, and reran the analysis. Most SEM experts

warn against the use of the model modification (e.g., Hoyle & Panter, 1995;

MacCallum & Austin, 2000; McDonald & Ho, 2002), which has been

described as potentially misleading and easily abused (MacCallum &

Austin, 2000, p. 216).

These concerns stem from the fact that SEM models that are modified

within the same sample to improve fit might be capitalizing on chance or

might not cross-validate well, which has been demonstrated in previous

research (MacCallum, Roznowski,& Necowitz,1992). Furthermore, adding

paths to an SEM model without removing any paths will generally improve

theempirical fit of themodel, so researchers mighteasilyobtain a well-fitting



7/31/2019 Martens, 2005

11/31

model that is not theoretically meaningful (for a discussion on empirical vs.

theoretical fit, see Olsson, Troye, & Howell, 1999).

Although reviews of SEM practices have not completely discouraged

modifyingSEMmodels that do notinitially fit well, they recommend that the

modifications be few, theoretically defensible, and cross-validated on an

independent sample, or they recommendthat theimportanceof themodifica-

tions at least be discussed (Boomsma, 2000; MacCallum & Austin, 2000;

McDonald & Ho, 2002). Therefore, even though SEM can be used for the

exploratory purpose of generating the best-fitting model, and most SEM

technical manuals describe such procedures, most SEM experts contend that

the technique should be used for confirmatory rather than exploratory pur-

poses (e.g., Bollen, 1989a, 1989b; Hoyle & Panter, 1995; MacCallum &

Austin, 2000; McDonald & Ho, 2002). This is the point of view that I haveadopted for this article.

Purpose of the Study

Given (a) the increase in popularity of SEM analysis (Steiger, 2001), (b)

the importance of SEM studies within the field, and (c) thevarious problems

and concerns that have been reported in previous SEM reviews (e.g.,

Breckler, 1990; MacCallum & Austin, 2000; McDonald & Ho, 2002), the

main purpose of this study was to review SEM practices within counseling

psychology. More specifically, I sought to assess and critique SEM research

regarding the following aspects of the analytic technique: (a) identifying

alternative models, (b) addressing the assumption of multivariate normality,

(c)using fit indices that areless sensitive to extrinsic factors and that general-izebetter acrosssamples,(d) assessingpath model fit separate from measure-

ment model fit, (e) reporting all parameter estimates, and (f) using SEM for

model generation/modification. Theseaspects were chosen because theyare

among themost salient concerns expressed in theSEM literature andbecause

they are practices that should be fairly easy for SEM researchers to modify,

should modification be necessary. Additionally, I sought to assess longitudi-

nal trends regarding these practices to determine if researchers have been

more likely over time to adhere to various recommendations regarding SEM

use (e.g., Boomsma, 2000; Breckler, 1990; Hoyle & Panter, 1995;

MacCallum & Austin, 2000; McDonald & Ho, 2002).



7/31/2019 Martens, 2005

12/31

METHOD

Selection of Studies

Studies during 1987 to 2003 in the Journal of Counseling Psychology

(JCP) were reviewed to assess practices related to SEM research in counsel-

ing psychology. JCP was chosen because of its status as the flagship journal

for research in the field. The year 1987 was chosen for several reasons. First,

it was in this year that Fassinger (1987) published an article in a special issue

ofJCP that served as an introduction to SEM. Second, a PsycINFO search

using thetermstructuralequation modelingrevealedonly 30 citationsbefore

1987, none of which were published in JCP. Third, 1987 appears to be the

year when SEM studies began to be consistentlypublished inJCP. Although

a handful of articles published in JCP before 1987 used path analysis (i.e.,

modeling with measured variables only), most of these articles did not use

the statistical procedures of assessingmodel fit that arenowcommonly used

in SEM research (i.e., using the c2 statistic or other fit indices).

To be included in the analyses, articles were selected that utilized either

SEM or path analysis for any portion of their results. Thus, studies that used

SEM as the main outcome analysis (e.g., testing several theoretical models)

or as a preliminary analysis (e.g., establishing a model and then further test-

ingthemodel using differentanalytical techniques) were included.Four arti-

cles included multiple and distinctly separate studies that used SEM, so for

these articles each study wascoded separately. A total of 105studies from 99

separate articles met these criteria and were included in the analyses.

Coding Procedure

Studies were coded independently by theauthorandoneof four advanced

graduate students on several variables, including (a) year of publication, (b)

type of study, (c) specification of multiple a priori models, (d) multivariate

normality, (e) choice of fit indices, (f) assessment of path fit separately from

measurement fit, (g) report of all parameter estimates or effect sizes, and (h)

use of post hoc model modification procedures. Interrater agreement was

assessed via the kappa statistic. For all variables, the kappa statistic was sig-

nificant (p < .001) and ranged from .77 to 1.00 (M= .86). Descriptively,

agreement percentages ranged from 89% (post hoc model modification pro-

cedures) to 100% (specificationof multiple a priori models). Anydiscrepan-

cies were reexamined conjointly by the two codersuntil proper classificationwas decided.



7/31/2019 Martens, 2005

13/31

Year of publication. Studies were coded two ways. For descriptive pur-

poses,theywere simplycodedby theyear of publication.However, forlongi-

tudinal analyses (described below), a potential problem would emerge if I

attempted to make comparisons by including each year as an independent

variable, because of themany levels of the independent variable (i.e., year of

the study) and the small cell sizes for some of the years would be entailed.

Thus, for the purposes of the longitudinal analyses, year of publication was

broken into four relatively equal categories: 1987 to 1995 (28 studies), 1996

to 1998 (23 studies), 1999 to 2001 (29 studies), and 2002 to 2003 (25 stud-

ies).6

Type of study. Studies were coded as a path analysis (i.e., model testing

with only manifest, or observed, variables),7

confirmatory factor analysis(CFA; i.e., model testing that involved testing a measurement model without

positing causalrelations among thelatent variables),or full SEM(i.e.,testing

causal relationships among latent variables).

Specifying multiple a priori models. Studies were coded in a yes/no for-

mat in terms of whether more than one a priori theoretical model that might

explain the data was discussed, meaning the multiple models to be tested

were specified before analyses were conducted. Studies that tested multiple

models only in the context of multigroup analysis (which specify different

constraints that are placed on parameter estimates within a model but do not

generally involve testing different theoretical models; see Byrne,2001) were

coded as no, as were those studies that included comparisons among models

that were generated post hoc (see below). Additionally, a few studies testedthe same conceptual model (i.e., all hypothesized paths remained the same)

but with slightly different endogenous constructs (e.g., perceived likelihood

that a situationwould occur vs. perceived seriousness of a situation should it

occur). In these instances, the studies were coded as testing only a single a

priori model.

Addressing multivariate normality. Studies were coded as yes/no in terms

of whether issues related to multivariatenormalitywereaddressed(e.g., indi-

cating that datawerenormally distributed, discussingappropriate data trans-

formations, considering alternative estimation strategies, etc.).

Choice of fit indices. For each study the individual fit indices used to

assess model fit were noted.

Assessing path fit separate from measurement fit. Studies were coded as

yes/no in terms of whether the fit of the path model separate from the



7/31/2019 Martens, 2005

14/31

measurement model was indicated. Additionally, I calculated fit of the path

model for those studies that provided the necessary information (i.e., fit of

themeasurement model and thefull structural model)but didnotcalculate fit

of the path model itself. Note that this coding did not apply to path analysis

studies (because no measurement model exists because only observed vari-

ables are included in the analysis) or CFA studies (because no causal struc-

tural relations posited among latent variables exist).

Reporting all parameter estimates/effect sizes. Studies were coded yes/no

in terms of whether all parameter estimates for the model were reported,

including parameter estimates for error and disturbance terms or if effect

sizes for the outcome variables were indicated. For studies that tested multi-

ple models, this criterion was applied only to the final model (e.g., a studywas coded yes if theauthors provided allparameter estimates for thebest fit-

ting of two competingmodelsbut didnot provide parameterestimates for the

other model).

Post hoc model modification procedures. Studies were coded yes/no to

indicate whether the authors engaged in empirically derived post hoc model

modification or model generation procedures (e.g., analyzing modification

indices or deletingnonsignificantpaths).Parceling items posthoc to improve

fit was also coded yes, but parceling items a priori was coded no.

Data Analysis

Descriptive statisticswere calculated for allvariables to determinethe fre-quency that counseling psychology researchers engaged in the various SEM

practices. To assess longitudinal trends on each of these practices, logistic

regression analyses were conducted where the four groupings of studies by

year were categorically coded as 0 (the oldest set of studies) to 3 (the newest

set of studies). Separate logistic regression analyses were conducted for the

following dependent variables: specifying multiplea priori models, address-

ing multivariate normality, choice of fit indices, assessing path fit separate

from measurement fit, reporting all parameter estimates, and using post hoc

model modification procedures. For comparison purposes, the newest set of

studies (2002 to 2003) was used as the reference group.

RESULTS

All results are discussed on a broad, general level so that no particular

author or study is indicated. A total of 105 separate studies published inJCP



7/31/2019 Martens, 2005

15/31

between 1987 and 2003 used either SEM or path analysis. Results indicated

that SEM seems to be a more popular data analytic technique, because more

than half (51%) of the studies were published between 1999 and2003. Inter-

est in SEM began to rise in 1995, given that eight SEM studies were pub-

lished in JCP that year while the most in any single previous year had beenfour. The largest percentage of studies used full structural modeling (45%),

followed by CFA (37%) and path analysis (18%). Frequency and type of

study, by year, are presented in Table 1.

Descriptive Statistics

Thenumberof studies, grouped by thefour yearlycategories, is presented

in Table 2 along with the percentage of studies that engaged in each SEM

practice. For example, the percentage in the normality category represents

thenumber of studies that addressed this consideration, while thepercentage

in themodify categoryrepresents thenumber of studies that modifiedmodels

post hocbased on empiricalcriteria. Thepercentage of studies that used each

fit index ispresented inTable3. Onlyindices thatwereusedin atleast 10% ofthe studies are presented. Results for each of these specific categories are

summarized below.


TABLE 1: Frequency of SEM Studies in JCP by Year

Type of Study

Year Full SEM CFA Path Analysis

1987 0 1 1

1988 1 2 0

1989 1 1 0

1990 2 1 1

1991 0 0 0

1992 1 0 1

1993 2 0 1

1994 3 1 0

1995 5 3 0

1996 1 1 1

1997 4 3 1

1998 9 1 2

1999 3 9 1

2000 3 3 1

2001 2 3 4

2002 6 3 3

2003 4 7 2

NOTE: SEM = structural equation modeling; CFA = confirmatory factor analysis.


7/31/2019 Martens, 2005

16/31

Specifying multiple a priori models. Approximately half (47.6%) of the

studies specified more than one theoretical model a priori. Additionally, a

greater percentage of studies in the older periods reported specifying multi-

ple a priori models compared with the newer periods (53.6% and 52.2% vs.

41.4% and 44.0%).

Addressing multivariate normality. Only 19.0% of the studies mentioned

the issue of multivariate normality, although results seem to indicate that

more recent studies were more likely to address the consideration. These

results are similar to those reported in prior SEM reviews (e.g., Breckler,

1990). Researchers used various ways to assess and deal with multivariatenormality, such as deleting outliers, transforming data, and using robust

estimation procedures.

Choice of fit indices. When examining researcherschoice of fit indices,

one should remember that many fit indices were unavailable during all peri-

ods covered by this review. For example, the CFI was not published until

1990, and a common citation for the RMSEA comes from 1993 (Browne &

Cudeck, 1993). As expected, the probability of the c2 statistic was the most

commonly used fit index (90.5% of the studies, although it is somewhat sur-

prising that it was not reported in all studies), followed by the CFI (63.8%)

and GFI (48.6%). In terms of year of publication, results suggest a decrease

in use over time of some fit indices that have been identified as problematic

(e.g.,GFIand AGFI), while theuse of other problematic indices seemssome-what consistent (e.g., c2/df ratio and NFI). Similarly, results suggest an

increase in use over time of some indices that have been identified as more


TABLE 2: Percentage of Studies Engaging in SEM Practices, Overall and by Year of

Publication

% A Priori

N Models % Normality % Path Fita

% PE/ES % Modify

Overall 105 47.6 19.0 2.1 46.7 40.0

1987 to 1995 28 53.6 3.6 0.0 50.0 39.3

1996 to 1998 23 52.2 26.1 7.1 56.5 43.5

1999 to 2001 29 41.4 10.3 0.0 34.5 44.8

2002 to 2003 25 44.0 40.0 0.0 48.0 32.0

NOTE: SEM = structural equation modeling; a priori models = specified multiple a priori theo-

retical models; normality= assessed formultivariate normality; pathfit = measured pathfit sepa-

rate from overall model fit; PE/ES = reported either all parameter estimates or effect sizes for

outcome variables; modify = engaged in post hoc empirical model modification procedures.a. Includes only full SEM studies.


7/31/2019 Martens, 2005

17/31

TABLE3:P

ercentageofStudiesUsingSelectedFitIndic

es,

OverallandbyYearofPublication

N

%c

2

%TLI

%CFI

%RMSEA

%SRMR

%c

2/df

%GFI

%AGFI

%NFI

Overall

105

90.5

42.9

63.8

38.1

36.2

3

7.1

48.6

20.0

25.7

1987to1995

28

85.7

21.4

17.9

3.6

46.4

2

5.0

57.1

28.6

25.0

1996to1998

23

100.0

43.5

73.9

17.4

30.4

3

9.1

56.5

17.4

26.1

1999to2001

29

82.8

62.1

86.2

41.4

34.5

4

4.8

51.7

27.6

31.0

2002to2003

25

96.0

44.0

80.0

92.0

32.0

4

0.0

28.0

4.0

20.0

NOTE:TLI=T

ucker-Lewisindex(ornon-normedfitindex);CF

I=comparativefitindex;RMSEA=rootmeansquareerrorofapproximation;SRMR=standard-

izedrootmeansquareresidual;c

2/df=c

2/degreesoffreedomratio;GFI=goodness-of-fitindex;AGFI=adjustedgoodness-of-fitindex;NFI=normedfitindex.

284


7/31/2019 Martens, 2005

18/31

accurate at identifying misspecified models (e.g., RMSEA), while the useof

other indices accurate at identifying such models has remained relatively

consistent (e.g., SRMR).

Assessing path versus measurement fit. Onlyone study thatusedfull SEM

explicitly attempted to assess the fit of the path model separately from the

measurement model. This is not surprising given that this concern is a fairly

recent addition to the SEM literature (e.g., McDonald & Ho, 2002). Several

studies assessed the fit of the measurement model before assessing the fit of

the full structural model but did not assess the fit of the path model itself and

generally drew conclusions in terms of the fit of the full structural model.

However, 14studies provided thenecessary information tocalculatethe fit of

thepath model itselfanddidnot include other features that would make suchcalculations impossible (e.g., statistical equivalency between the measure-

ment and structural models or removing a variable from the measurement

model when testing the structural model). Twenty-two comparisons were

included in these studies because thefit of themodel wasassessed separately

on different groups (e.g., men and women) in several studies. By using the

RMSEA as a measure of fit (which conceptually measures the degree to

which the model would fit the population covariance matrix, if it were

known; see Browne & Cudeck, 1993), with smaller values indicating better

fit, results indicated relatively equal fit between the path and measurement

portions of the models (MRMSEA values = .068 and .065, respectively).

These results differ from prior reviews of SEM research in psychology

(McDonald & Ho, 2002), which reported that the fit of the path model was

generally worse than the measurement model. The current review, however,did reveal several studies where the fit of the path model was considerably

worse than the fit of the measurement model (e.g., RMSEA of .165 vs. .054;

.160 vs. .069), yet, based on the fit of the full structural model, the authors

concluded that the model fit the data well. Although specific guidelines will

vary, a value of .08 for the RMSEA is generally considered an upper bound

value for indicating an adequate fit to a model (e.g., Hu & Bentler, 1999).

Therefore, in these studies the authors interpreted the relationships among

theirlatent variablesas beingmeaningful (because theoverallmodelfit fairly

well), when in fact the portion of their model that examined only these latent

variables did not fit well.

Reporting all parameter estimates/other measures of effect size. Ap-

proximately half (46.7%) of all studies either reported all parameter esti-

mates in the model or provided otherindications of effect size (e.g., squared

multiplecorrelations) for the outcome variables in the model. These results

were somewhat consistent over the years, except for 1999 to 2001, when



7/31/2019 Martens, 2005

19/31

only 34.5% of the studies reported either all parameter estimates or other

measures of effect size.

Modifying models post hoc via empirical criteria. A total of 40.0% of the

studies used empirically derived criteria (e.g., modification indices or dele-

tion of nonsignificant parameters) to either improve the fit of the model or

generate a well-fitting model. These numbers were fairly consistent over the

four periods, although the newest period had the fewest studies that engaged

in this practice (32.0%). Of these studies that used empiricalmodelmodifica-

tion or generation procedures, approximately half notedconsiderations such

as (a) modifications that were theoretically plausible, (b) the tentative nature

of such models, or (c) theimportance of (and in some instancesactual) cross-

validation.

Logistic Regression Analyses

A series of logistic regression analyses were conducted to more precisely

assess changes over time regarding the SEM practices addressed in this

review. For each analysis, the four-category grouping of study year was

entered as a categorical independent variable; the use of the specific practice

or fit index (yes/no) was entered as the dependent variable; and the newest

categoryof studies (2002 to2003) wasusedas thereference group.8 Note that

for some fit indices that have been more recently developed or popularized

(e.g., comparative fit index and RMSEA), the oldest set of studies was not

included in the logistic regression analyses,and note that an analysis was not

conductedforassessing path versus measurement fit (because only onestudyassessed path vs. measurement fit).

Results comparing the yearly categories are summarized in Tables 4 and

5. Of the SEM practices outside of fit index usage, a significant omnibus

effect emerged for addressing multivariate normality,c2(3,N= 105) = 14.28,

p = .003. Comparisons between the yearly categories indicated that studies

publishedin 2002 to2003 were more likelyto assessfor multivariatenormal-

itythan thosepublishedin1987 to1995 (odds ratio= 17.86,p = .008) or 1999

to 2001 (odds ratio = 5.78, p =.017) but that differences between 2002 to

2003 and 1996 to 1998 were not statistically significant.

For the use of the fit indices, a significant omnibus effect emerged for the

AGFI, c2(3,N= 105)= 7.77,p = .05; RMSEA,c2(2,N= 77) = 32.20,p < .01;

and Tucker-Lewis index, c2(3,N= 105)= 10.03,p = .02. For the AGFI, com-

parisons between the yearly categories indicated that studies published in

2002 to 2003 were less likely than those published in 1987 to 1995 (odds

ratio = .10, p = .04) and 1999 to 2001 (odds ratio = .11, p = .05) to use the

AGFI. For the RMSEA, results indicated that studies published in 2002 to



7/31/2019 Martens, 2005

20/31

2003 were more likely than those published in 1996 to 1998 (odds ratio =

55.56, p < .01) or 1999 to 2001 (odds ratio = 16.39, p < .01) to use theRMSEA. Even though the omnibus test, which examines the overall differ-

ence among the different categories, was statistically significant for the

Tucker-Lewis index, no significant differences emerged between studies

published in 2002 to 2003 and any other yearly category. Finally, even

though the omnibus test for use of the GFI was not statistically significant,

l2(3, N= 105) = 5.92, p = .12, significant differences existed between the

yearly categories. Studies published in 2002 to 2003 were less likely to use

the GFI than those published in 1987 to 1995 (odds ratio = .29, p = .04) or

1996 to 1998 (odds ratio = .30, p = .05).

DISCUSSION

In analyzing the results of this study, I am reminded of the statement

regarding thewaterglass that canbe seen as eitherhalf emptyor half full. The

pessimist might look at the results and see significant cause for concern and


TABLE 4: Logistic Regression Analyses Summaries Comparing SEM Practices by

Study Year

SEM Practice b Wald Test OR 95% CI(OR)

A priori models

1987 to 1995 vs. 2002 to 2003 0.38 0.48 0.68 0.23 to 2.00

1996 to 1998 vs. 2002 to 2003 0.33 0.32 0.72 0.23 to 2.22

1999 to 2001 vs. 2002 to 2003 0.11 0.04 1.11 0.38 to 3.28

Normality

1987 to 1995 vs. 2002 to 2003 2.89 6.94 17.86 2.10 to 66.67

1996 to 1998 vs. 2002 to 2003 0.64 1.03 1.89 0.55 to 6.45

1999 to 2001 vs. 2002 to 2003 1.75 5.71 5.78 1.37 to 24.39

PE/ES

1987 to 1995 vs. 2002 to 2003 0.08 0.02 0.93 0.31 to 2.72

1996 to 1998 vs. 2002 to 2003 0.34 0.35 0.71 0.23 to 2.221999 to 2001 vs. 2002 to 2003 0.57 1.01 1.75 0.58 to 5.26

Modify

1987 to 1995 vs. 2002 to 2003 0.32 0.30 0.72 0.23 to 2.26

1996 to 1998 vs. 2002 to 2003 0.49 0.67 0.61 0.19 to 1.98

1999 to 2001 vs. 2002 to 2003 0.55 0.92 0.58 0.19 to 1.76

NOTE: A priori models = specifiedmultiplea priori theoreticalmodels;normality= assessed for

multivariate normality; PE/ES = reported either all parameter estimates or effect sizes for out-

come variables; modify = engaged in post hoc empirical model modification procedures; OR =

odds ratio;CI = confidence interval.Oddsratios greater than 1 indicate that studies from 2002 to

2003 were more likely toengage inthe practice,while odds ratioslessthan1 indicatethatstudies

from 2002 to 2003 were less likely to engage in the practice.


7/31/2019 Martens, 2005

21/31

suggest that much counseling psychology research using SEMhas been, and

continues to be, in a state of disarray. The optimist, however, might conclude

that SEM practices within counseling psychology research are improving. Itend to believe that the truth lies in the middle and will address both the

causes for concern and the strengths regarding SEM research in counseling

psychology.


TABLE 5: Logistic Regression Analyses Summaries Comparing Use of Fit Indices by

Study Year

Fit Index b Wald Test OR 95% CI (OR)

c2/df

1987 to 1995 vs. 2002 to 2003 0.69 1.35 2.00 0.62 to 6.45

1996 to 1998 vs. 2002 to 2003 0.04 0.00 1.04 0.33 to 3.30

1999 to 2001 vs. 2002 to 2003 0.20 0.13 0.82 0.28 to 2.43

NFI

1987 to 1995 vs. 2002 to 2003 0.29 0.19 0.75 0.20 to 2.75

1996 to 1998 vs. 2002 to 2003 0.35 0.25 0.71 0.18 to 2.74

1999 to 2001 vs. 2002 to 2003 0.59 0.84 0.56 0.16 to 1.95

GFI

1987 to 1995 vs. 2002 to 2003 1.23 4.41 0.29 0.09 to 0.92


AGFI

1987 to 1995 vs. 2002 to 2003 2.26 4.21 0.10 0.01 to 0.90

1996 to 1998 vs. 2002 to 2003 1.62 1.95 0.20 0.02 to 1.92

1999 to 2001 vs. 2002 to 2003 2.21 4.03 0.11 0.01 to 0.95

SRMR

1987 to 1995 vs. 2002 to 2003 0.61 1.14 0.54 0.18 to 1.67

1996 to 1998 vs. 2002 to 2003 0.08 0.01 1.08 0.32 to 3.65

1999 to 2001 vs. 2002 to 2003 0.11 0.04 0.89 0.29 to 2.79

TLI

1987 to 1995 vs. 2002 to 2003 1.06 2.99 2.88 0.87 to 9.52

1996 to 1998 vs. 2002 to 2003 0.02 0.00 1.02 0.33 to 3.19

1999 to 2001 vs. 2002 to 2003 0.73 1.74 0.48 0.16 to 1.43

RMSEA


CFI

1996 to 1998 vs. 2002 to 2003 0.35 0.25 1.41 0.37 to 5.46

1999 to 2001 vs. 2002 to 2003 0.45 0.37 0.64 0.15 to 2.70

NOTE: c2/df=c

2/degreesof freedomratio;NFI = normedfit index;GFI = goodness-of-fit index;

AGFI = adjusted goodness-of-fit index; SRMR= standardized rootmean square residual;TLI =

Tucker-Lewis index (ornon-normed fitindex);RMSEA = rootmean square errorof approxima-

tion; CFI= comparative fit index; OR = odds ratio; CI = confidence interval.Odds ratiosgreater

than 1 indicate that studies from the years 2002 to 2003 were more likely to use the fit index,

whileratios less than 1 indicatethatstudiesfrom2002 to2003were less likelytouse thefitindex.


7/31/2019 Martens, 2005

22/31

The Glass Is Half Empty

Results from this review revealed several concerns involving the use of

SEM within counseling psychology research, four of which are discussed.

First, slightly less than half of the studies tested more than one a priori theo-

retical model,and thepercentage of studies that engaged in this practiceactu-

ally decreased over time (53.6% of the studies between 1987 and 1995 com-

pared with 44.0% of the studies between 2002 and 2003). Testing multiple a

priorimodels isgenerallyconsidered thestrongest useof SEM(e.g., Hoyle&

Panter, 1995; MacCallum & Austin, 2000), so it is somewhat disheartening

that only about 50% of the studies in JCP (and even fewer in more recent

years) engaged in this practice.

Second, slightly less than 50%of thestudies eitherprovidedallparameter

estimatesor specifiedeffect sizes in their SEMmodels, with no improvement

in this practice noted over time. This result is somewhat surprising in light of

the increasedattentionin thepsychological literature to reportingeffect sizes

(e.g., Cohen, 1994; Kirk, 2001; Wilkinson & APA Task Force on Statistical

Inference, 1999) and the ease that such effects can be reported via SEM (in

Windows-based programs, reporting generally involves clicking on a box).

Third, despite several articles that provided compelling evidence against

the use of certain fit indices (e.g., Hu & Bentler, 1998; Marsh et al., 1988;

Steiger, 2000), indices such as the c2/dfratio and the normed fit index con-

tinue to be used in several SEM studies.

Fourth, 40% of the studies used post hoc empirical model modification

procedures, which has been consistently discouraged in the SEM literature

(Hoyle & Panter, 1995; MacCallum et al., 1992; McDonald & Ho, 2002),although approximately half of these studies acknowledged the limitation of

this approach, and a few studies (n = 7) even conductedcross-validation pro-

cedures with the empirically developed model. Nevertheless, to summarize

thepessimisticpointof view, onewouldconclude that many SEMstudies use

weak methodological approaches, provide no information regarding effects

on outcome variables, andcontinueto useless than desirablemeasures of fit.

Why, then, do counseling psychology researchers whouseSEMoften not

engage in the best practices related to the technique? One explanation could

be disconnect between journals where articles related to SEM methodology

tend to be published and scholarly journals typically read by researchers,

reviewers, and editors. Although there are exceptions (e.g., Quintana &

Maxwell, 1999; Tomarken & Waller, 2003), such articles areoften published

in journals less often read by most counseling psychologists (e.g.,Multivariate Behavioral Research, Psychological Methods, Structural

Equation Modeling). Therefore, many counseling psychologists may not

stay current with trends involving SEM practices.



7/31/2019 Martens, 2005

23/31

Another explanationmaybe therelativeease of using statistical programs

to conduct SEM analyses. Most SEM software (e.g., AMOS and EQS) does

not require the individual to have an in-depth knowledge of SEM. Generally,

these programs simply require the user to draw his or her hypothesized

model(s), and, assuming that the model is properly (over)identified, the nec-

essary calculations are automatically made. Although the ease with which

these programs allow researchers to utilize SEM certainly has benefit, a

potential drawback may be that people without a thorough background in

SEM theory, statistical assumptions, or best practices are utilizing the

technique (see Steiger, 2001).

A final explanation, one especially related to the use of post hoc model

modification or generation procedures, could involve a so-called file drawer

problem (Rosenthal, 1979) that may exist within SEM research. Tradition-ally, a file drawer problem refers to the practice of publishing only statisti-

cally significant results and relegating nonsignificant findings to ones file

drawers. In SEM analyses, file drawer problems would refer to only publish-

ingfindingsaboutwell-fittingmodels. Inallbut a handful of studies included

in this review, theauthors concluded that their model fit thedata (e.g., well-

fitting model or adequate fit to the data). Although the JCP rejection rate

for studies that include final models that do not fit well is unknown, itis plau-

sible to believe researchers perceive that they must develop a well-fitting

model to improve their chance at publication. Researchers may therefore be

motivated to engage in whatever statistical and empirical procedures are

available in the pursuit of the well-fitting model. If indeed well-designed

SEM studies that demonstrate a less-than-good fit are not being considered

for publication inJCP, then the overall knowledge in our field may be suffer-ing. One can argue that in science the relationships that do not exist are as

important to know as are those that do, yet the only studies using SEM that

seem to appear in JCP are the latter.

The Glass Is Half Full

Now, we turn to some of the more optimistic findings from this study,

most of which relateto improved SEMpracticesin themost recentset ofJCP

studies. First, although the overall percentage of studies that acknowledged

the importance of multivariate normality when conducting SEM was rela-

tively low even among the most recent studies (32%), results indicated that

more recent studies were more likely to address multivariate normality than

older studies. Second, newer studies were more likely touse theRMSEAand

less likely to use the GFI and AGFI to assess model fit. These results are

encouraging because the RMSEA has been shown to be one of the better

measures for detectingtrue model misspecification, while theGFI andAGFI



7/31/2019 Martens, 2005

24/31

are influenced by factors other than the fit of the model itself (Hu & Bentler,

1998; Marsh et al., 1988; Steiger, 2000). Finally, results from studies that

provided the necessary information indicated less of a discrepancy between

path and measurement fit than has been reported in other reviews of SEM

practices (McDonald & Ho, 2002), suggesting that the phenomenon of a

well-fitting measurement model masking a poor-fitting path model may not

be a general concern within counseling psychology research.

What might be some explanations for these encouraging trends within

SEM research in counseling psychology? First, it seems that more classes in

SEM are being offered as cornerstones, or at least electives, in counseling

psychology graduate training, which should have the effect of improving all

SEM practices.

Second, the improvement in addressing multivariate normality may be aby-product of enhanced overall awareness regarding the importance of data

screening (e.g.,Farrell,1999;Wilkinson, 1999). Although assumptions such

as normally distributed data for various statistical tests are certainly not a

recent phenomenon (e.g., Guilford, 1956), perhaps more researchers are

actively aware of the importance of such considerations when designing and

reporting their studies, or more editors/reviewers are asking that such infor-

mation be included.

Third, even though many counseling psychology researchers may not

read journals that typically publish SEM methodology articles, it seems that

some articles become relatively well known outside of the methodological

community. For example,40%of thearticles publishedin 2002 to 2003 cited

Hu and Bentlers work on fit indices (1998, 1999), which might explain why

some of their recommendations are becoming more popular (e.g., using theRMSEA and not using the GFI). Regarding the RMSEA, one should also

remember that part of itsincrease inuseis probably anartifactof itsrelatively

recent promotion in the SEM literature (e.g., Browne & Cudeck, 1993), but

this alone does not explain why more than 90% of theJCP studies in 2002 to

2003 used the index. Explaining the relatively equal fit between the path and

measurement portions of the full SEM models inJCP, in contrast to findings

from other reviews (McDonald & Ho, 2002), is more difficult. One possibil-

ity is that the constructs involved in counseling psychology research tend to

display stronger relationships with each other than in other areas of psychol-

ogy, but such a conclusion should be considered tentative at best. One must

remember that (a) less than 30% of the full SEM studies in this review pro-

vided the necessary information to calculate both path and measurement fit,

(b)most of such studies (57%) used post hocmodelmodification procedures,

which could inflate model fit by capitalizing on sample-specific relation-

ships, and (c) several studies demonstrated a considerably worse path fit



7/31/2019 Martens, 2005

25/31

when compared with measurement fit. Therefore, more definitive conclu-

sions on this topic await further study.

Although thisreviewcovereda broadrepresentationof proceduresrelated

to SEM practice, it was not exhaustive. In the course of analyzingstudies for

this review, I noticed that many JCP studies included other practices that

have been questioned in the SEM literature. One such practice involved par-

celing items to reduce the number of parameters in the study (see Russell,

Kahn, Spoth, & Altmaier, 1998), especially in the context of CFA. Although

parceling can sometimes be warranted, especially when one is primarily

interestedin relationshipsamonglatent constructs, it is lessappropriate when

one is most interested in the relationships among specific items (as in CFA;

Little et al., 2002) or when items making up the parcels are not

unidimensional. In fact, recentMonteCarlosimulations havefound that itemparcels often mask misspecified models by yielding acceptable factor load-

ings and fit indices (Bandalos, 2002; Kim & Hagtvet, 2003). A second prac-

tice involved some researchers being overly optimistic in their interpretation

of fit indices, a concern that has been addressed in other reviews (e.g.,

MacCallum & Ho, 2000). For example, a recent JCP study concluded that a

RMSEA value of .17was an indicator of an adequate fit (lower RMSEA val-

ues indicatebetterfittingmodels),when in fact such a valueis well aboveany

recommended criteria (e.g., .08). Third, a few recently published studies

engaged in the practice of correlating error terms, often to improve the fit of

the model. This practice, except in instances such as longitudinal studies

where the same measure is used on separate occasions, is generally frowned

upon because it is rarely theoretically defensible (e.g., Boomsma, 2000;

Hoyle & Panter, 1995). Finally, only a few studies mentioned the issue ofalternative equivalent models, which can be particularly problematic when

conceptualizing SEM as causal modeling (see MacCallum et al., 1993).

These andother SEM practices (e.g., problems with missing data andassess-

ing model identification) were beyond the scope of the present review but

would be worthwhile to address in future studies.

This review waslimited. Onelimitation is that a yes/nocoding criteriawas

used to categorize each study regarding thevarious SEMpractices. This cod-

ingprocedurewasuseful forprovidingan overall summary of SEMpractices

within counseling psychology but did not provide information in areas such

as (a) the relevancy of a SEM practice to the unique context of an individual

study (e.g., reportingeffect sizes maybe more important in studies with clear

outcome variables of interest, as opposed to studies that primarily involve

CFA) or (b) the severity (e.g., deleting one nonsignificant parametervs. add-

ingmultipleparameters post hoc) or specific mechanisms (e.g.,various ways

to assess multivariate normality) for some SEM practices. Such information

was beyond the scope of the present study.



7/31/2019 Martens, 2005

26/31

A second limitation was that most recommended practices examined in

this review, even those based on empirical findings, contained an inherent

degree of subjectivity. Thus, even though most SEM experts might agree

with a particular practice (e.g., engaging in no or limited post hoc model

modification), one could probably locate dissenters.

A third limitation was that the focus on actual SEM practices provided no

information regarding the theoretical foundations of the studies that were

reviewed. Although assessing the degree to which a study tested some theo-

retical foundation would be difficult to quantify, such information would be

useful to obtain in future reviews.

Despite thesepotential limitations, this review providedan important pic-

ture of how counseling psychology researchers have used andcontinueto use

SEM in terms of several best practices related to theanalytical technique. Tosummarize, SEM researchers in counseling psychology have a history of not

engaging in thebest practicesrelated to the techniqueandin many areas con-

tinue to ignore such practices. In other areas, however, such as recognizing

the importance of normally distributed data and using more accurate fit indi-

ces, the practices of counseling psychologists seem to be improving. Based

on this review, I encourage counseling psychology researchers who utilize

SEM to pay closer attention to the practices covered in this reviewand to fol-

low the recommendations of experts when possible (e.g., Hoyle & Panter,

1995; MacCallum & Austin, 2000; McDonald & Ho, 2002; Tomarken &

Waller, 2003). Such recommendations include the following:

Identifying multiple a priori theoretically derived models to test

Assessing for multivariate normality and using appropriate procedures (e.g.,robust estimation procedures) should non-normality be detected

When conducting full SEM analyses (i.e., causal paths hypothesized between

latent variables), providing some indication of the fit of the path model sepa-

rate from the measurement model

Reporting all parameter estimates or other means of determining effect size,

especially for endogenous variables; thisreporting can be easily performedby

including the R2 values for each outcome variable or including all parameter

estimates in a path diagram

Avoiding empirically derived post hoc model modification procedures or at

least engaging in only those modifications that can be theoretically defended

and noting the limitations of the procedure

Using measures of fit that have been shown to be more accurate at rejecting

misspecified models (e.g., RMSEA, SRMR, comparative fit index, Tucker-

Lewis index, and incremental fit index)

Although slight inconsistenciesmightemerge amongtheserecommenda-

tions and recommended practices that have been addressed elsewhere,



7/31/2019 Martens, 2005

27/31

researchers should find considerable overlap. Besides the clear research

implications of improving SEM practices within counseling psychology,

training andpractice would benefit as well. Counseling psychology graduate

students would at least become more informed consumers of SEM research

and could better evaluate the quality of the work to which they are exposed.

Students interested in pursuing research careers wouldbe better grounded in

theanalytical technique, which would hopefullyopen more doors in terms of

analysis and design options.

For psychological practice, the implications of enhancements in any sta-

tistical technique are generally indirect, but improving practices related to

SEM could improve the science associated with studies that are relevant to

the application of psychology. Put another way, practice benefits when the

science that is supposed to inform practice is improved. Additionally, Iencourage counseling psychology journal reviewers and editors to pay close

attention to such recommendations and require that researchers address

important SEM considerations should they fail to do so, regardless of

whether the researcher is adhering to therecommendation. Finally, I encour-

age all counseling psychologists involved with SEM at any level to move

away from what I perceive to be a culture that only values well-fitting mod-

els. In effect, we must place more valueon analyses that have a solid theoreti-

cal foundation and follow sound analytic procedures rather than becoming

enamored with reporting a finding that demonstrates a good fit and therefore

doing whatever possible to achieve such an outcome.

NOTES

1. Indiscussing theseadvantagesof structural equationmodeling (SEM), I am notsuggesting

thatSEM is inherently superiorto other analyticaltechniques.SEM is,however,particularlyuse-

ful when testing complex models and/or specific underlying theoretical constructs.

2. Notethat thismodel doesnot include every parameteror variablenecessaryto identifyand

testa structuralequationmodel (e.g., errortermsare notincluded,and specificparameters arenot

identified). Such information is beyond the scope of this article, and interested readers can con-

sultsources thatserve as general introductionsfor novices to SEM(e.g., Byrne, 2001; Raykov&

Marcoulides, 2000).

3. Several authors (e.g., Boomsma, 2000; Hoyle & Panter, 1995; MacCallum, Wegener,

Uchino, & Fabrigar, 1993; McDonald & Ho, 2002; Tomarken & Waller, 2003) discuss the issue

of assessing equivalent versus nonequivalent a priori models. This topic is beyond the scope of

this article, and interested readers can consult these sources.

4.Thec2

difference testis conductedby calculating thedifference inc2

values anddegreesof

freedom between the two nested models. The resulting values are examined to determine if sig-

nificantdifferencesexistin fitbetweenthe twomodels.For example,assume thata lessrestricted

model (i.e.,the modelwith fewerpaths) hada c2

valueand degrees offreedom of100.00 and30,

while the more restricted model had values of 90.00 and 28. The c2

difference would be 10.00,



7/31/2019 Martens, 2005

28/31

which is statistically significant (p < .05) with two degrees of freedom (30 28 = 2). Therefore,

onewouldconcludethat themorerestricted model(whichhas thelowerc2

value) demonstrates a

significantlybetterfit thanthe lessrestricted model. If themore restrictedmodelhad ac2

value of

95.00, however, the differences would not be considered statistically significant. In such cases,

researchers generally accept the simpler of the two models.

5.The issueof sample size inSEM analysisis somewhatcontroversial,and a detaileddiscus-

sion is beyond the scope of this article. Some authors recommend addressing sample size based

on theratioof participants to numberof parameters(Jackson, 2003).Others discuss samplesize

in termsof power (e.g., MacCallum,Browne,& Sugawara,1996),while others provideabsolute

guidelines (e.g., Hatcher, 1994). Nonetheless, most sources will indicate that, depending on the

models complexity, a researcher should have at least 200 cases.

6. Studies were not divided intofour equalgroups chronologically because I did not want to

separate studies publishedin the same year or, in some cases, the same issue ofJCP. Therefore,

the four groups were created as equally as possible while maintaining this stipulation.

7. When conducting a path analysis, a researcher generally uses the same procedures as in

SEM (i.e., causal relationships specified among multiple variables), except that only observed

variables are included. Therefore, there is no measurement model to be tested, because only

observed variables are included. However, the issues described in this article apply equally to

both path analytic studies and SEM studies that include latent variables.

8. One reviewer suggested that the logistic regression analyses be conducted with the four

yearly categories conceptualized as a continuous independent variable. I chose to retain a cate-

goricalapproachfor thefollowingreasons:(a) theyearlygroupings technicallydonotmeet crite-

ria for a continuous variable; (b) changes in SEM practices over time should be reflected in sig-

nificantdifferences between the newest set of studies and older studies; and (c) interpretationof

oddsratios in logisticregression withcontinuousindependentvariables is notas straightforward

as interpretationwithcategorical variables(see Pedhazur, 1997).Therefore,I choseto conceptu-

alize the yearly groupings as categorical independent variables.

REFERENCES

Amemiya, Y., & Anderson, T. W. (1990).Asymptotic chi-square tests for a large class of factor

analysis models. Annals of Statistics, 18, 1453-1463.

Anderson,J. C.,& Gerbing,D. W. (1984).The effect of samplingerroronconvergence, improper

solutions, andgoodness-of-fit indices for maximumlikelihood confirmatoryfactor analysis.

Psychometrika, 49, 155-173.

Bandalos,D. L. (2002).The effects of itemparcelingon goodness-of-fit and parameter estimate

bias in structural equation modeling. Structural Equation Modeling, 9, 78-102.

Bentler, P. M. (1983). Some contributions to efficient statistics for structural models: Specifica-

tion and estimation of moment structures. Psychometrika, 48, 493-571.

Bentler,P. M. (1990).Comparative fitindexes in structural models. PsychologicalBulletin, 107,

238-246.

Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate

Software.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis ofcovariance structures. Psychological Bulletin, 88, 588-606.

Bollen,K. A. (1989a).A newincrementalfit index forgeneralstructural equationmodels.Socio-

logical Methods & Research, 17, 303-316.

Bollen, K. A. (1989b). Structural equations with latent variables. New York: John Wiley.



7/31/2019 Martens, 2005

29/31

Bollen,K. A. (1990).Overallfitin covariance structuremodels:Two types ofsample sizeeffects.

Psychological Bulletin, 107, 256-259.

Boomsma, A. (2000). Reporting analyses of covariance structures. Structural Equation Model-

ing, 7, 461-483.

Breckler, S. J. (1990). Applications of covariance structure modeling in psychology: Cause for

concern? Psychological Bulletin, 107, 260-273.

Browne, M.W.,& Cudeck, R.(1993).Alternative ways ofassessing modelfit.In K.A. Bollen &

J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA:

Sage.

Browne, M. W., & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of

linear latent variate models.British Journal of Mathematical and Statistical Psychology, 41,

193-208.

Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications,

and programming. Mahwah, NJ: Lawrence Erlbaum.

Chou, C.,Bentler, P. M.,& Satorra,A. (1991).Scaled teststatistics androbuststandarderrors for

non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of

Mathematical and Statistical Psychology, 44, 347-357.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.

Farrell, A. D. (1999). Statistical methods in clinical research. In P. C. Kendall, J. N. Butcher, &

G. N. Holmbeck (Eds.) Handbook of research methods in clinical psychology (2nd ed., pp

72-106). New York: John Wiley.

Fassinger, R. (1987). Use of structural equation modeling in counseling psychology research.

Journal of Counseling Psychology, 34, 425-436.

Gerbing,D. W.,& Anderson,J. C. (1993).MonteCarlo evaluationsof goodness-of-fit indices for

structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation

models (pp. 40-65). Newbury Park, CA: Sage.

Guilford, J. P. (1956).Fundamentalstatistics in psychology andeducation. NewYork: McGraw-

Hill.

Hatcher,L. (1994).A step-by-step approach to using SAS

system for factoranalysis and struc-

tural equation modeling. Cary, NC: SAS Institute.

Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle(Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 158-176).

Thousand Oaks, CA: Sage.

Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to

underparameterized model misspecification. Psychological Methods, 3, 424-453.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:

Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.

Jackson, D. L. (2003).Revisitingsample size andnumber of parameterestimates:Some support

for the N:q hypothesis. Structural Equation Modeling, 10, 128-141.

Jreskog, K. G., & Srbom, D. (1981).LISREL V. Mooresville, IN: Scientific Software.

Kim, S., & Hagtvet, K. A. (2003). The impact of misspecified item parceling on representing

latent variables in covariance structure modeling: A simulation study. Structural Equation

Modeling, 10, 101-127.

Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and

Psychological Measurement, 61, 213-218.

Little,T.D., Cunningham,W.A., Shahar,G., & Widaman,K. F. (2002).Toparcelor notto parcel:Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151-173.

MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psy-

chological research. Annual Review of Psychology, 51, 201-226.



7/31/2019 Martens, 2005

30/31

MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determina-

tionof sample sizefor covariancestructuremodeling.PsychologicalMethods, 1, 130-149.

MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in

covariance structureanalysis: Theproblemof capitalizationon chance. PsychologicalBulle-

tin, 111, 490-504.

MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of

equivalent models in applications of covariance structure analysis. Psychological Bulletin,

114, 185-199.

Martens, 2005

Documents

Dirk Martens

Work Samples Lora Martens

Woninggids Martens Voorjaar 2009

Marianne martens | Voorstelling GreenBridge

Dr. Martens Infographic

Martens Clause Critique

Martens sample chapter

Marco Martens - Rubberboot

M3+m+martens dragalin

Dr. Martens ID - Website

Josef martens

Dr. Martens

U+ Project Dr Martens

KAREL MARTENS ABSTRACTclasses.dma.ucla.edu/.../155/projects/ryan/karelmartens.pdf · 2006. 5. 2. · KAREL MARTENS ABSTRACT Dutch typographic designer Karel Martens is one of the

Karel martens

Martens charlotte

Doc Martens Zine

CV Lennart Martens

Kerlix AMD Ellebooggezwel Martens

Inauguratie prof.Rob Martens