Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Measuring Mathematics Interest and Affect: An Item Response Theory Evaluation of
the Self Description Questionnaire I (SDQI)
by
Tianlan Wei, M.Ed.
A Dissertation
In
Educational Psychology
Submitted to the Graduate Faculty of Texas Tech University in
Partial Fulfillment of the Requirements for
the Degree of
DOCTOR OF PHILOSOPHY
Approved
Lucy Barnard-Brak, Ph.D. Chair of Committee
William Lan, Ph.D.
Tara Stevens, Ed.D.
Mark Sheridan Dean of the Graduate School
August, 2014
Copyright 2014, Tianlan Wei
Texas Tech University, Tianlan Wei, August 2014
ii
ACKNOWLEDGMENTS Many people have influenced my work on this dissertation, and I would like to
recognize each of them with my sincere appreciation. First, my deepest gratitude goes
to Lucy Barnard-Brak, my committee chair, and an incredible source of support and
inspiration during my graduate study at Texas Tech University. It was through
working with her that I was exposed to many advanced research methods and
quantitative skills beyond coursework, including the item response theory techniques,
which this dissertation is based on. Lucy means so much to me in my scholarly life: a
dedicated mentor, a most intelligent team leader, a caring elder sister, and a role model
always for me to emulate.
Next, I would like to thank William Lan and Doug Simpson for bringing me
back to academia when no one else saw my potential as a researcher. It is no
exaggeration when I say that their assistance was life-changing to me. As my
committee member, Dr. Lan’s comments on my dissertation work have always been
encouraging and constructive. He impressed upon me how kind and generous
academics can be.
Although my work has benefited from many professors and colleagues, I must
recognize the critical role of Tara Stevens, also a committee member of mine, for
establishing the foundation for my research agenda. Her earlier work on mathematics
education, particularly the gender gap and associated affective-domain factors, has had
a great influence on this dissertation as well as my other studies.
My parents are of the birth cohort that was deprived of much education due to
the political movements back then in China, and I feel grateful that they would never
fail to value my education. I would also like to thank my uncle for encouraging me to
pursue a terminal degree for years when I had so little confidence in myself.
Lastly, I would like to thank Catrina Wang for breaking years of silence to
show up at this critical stage of my dissertation work, though briefly. Fortunately and
unfortunately, some human feelings are beyond expression and any psychometric
calibration—they will forever remain unspeakable and immeasurable.
Texas Tech University, Tianlan Wei, August 2014
iii
TABLE OF CONTENTS ACKNOWLEDGMENTS .................................................................................... ii
ABSTRACT ........................................................................................................... v
LIST OF TABLES .............................................................................................. vii
LIST OF FIGURES ........................................................................................... viii
I. INTRODUCTION ............................................................................................. 1
Context of the Study ......................................................................................... 1 Statement of the Problem .................................................................................. 4
Theoretical Issues ........................................................................................ 4 Conceptual Issues ........................................................................................ 6 Measurement Issues .................................................................................... 8
Purpose of the Study ....................................................................................... 10 Research Questions ......................................................................................... 11 Significance of the Study ................................................................................ 11 Limitations of the Study .................................................................................. 13
II. REVIEW OF LITERATURE ....................................................................... 15
The Concept of Interest ................................................................................... 15 Personal Interest and Situational interest .................................................. 16 Interest, Affect, and Motivation ................................................................ 18
Interest and Affect in Theories of Achievement Motivatoin .......................... 20 Expectancy-Value Theory of Motivation.................................................. 22 Attribution Theory of Motivation ............................................................. 24 Goal-Orientation Theories ........................................................................ 26 Intrinsic and Extrinsic Motivation ............................................................ 27 The Roles of Interest and Affect in Achievement Motivation .................. 28
Gender, Ethnic, and Age differences in Mathematics Interest and Affect ...... 29 Gender Differences ................................................................................... 29 Ethnic Differences ..................................................................................... 32 Age Differences ........................................................................................ 34
Summary of the Literature .............................................................................. 36 Research Questions and Hypotheses ............................................................... 38
III. METHOD ...................................................................................................... 40
Texas Tech University, Tianlan Wei, August 2014
iv
Description of the Data ................................................................................... 40 Sampling Frame ........................................................................................ 41 Sample ....................................................................................................... 42
Instrumentation ............................................................................................... 42 Face Validity ............................................................................................. 43
Item Response Theory..................................................................................... 45 Assumptions .............................................................................................. 47 Graded Response Model ........................................................................... 48
Differential Item Functioning ......................................................................... 51 Lord's Wald Test ....................................................................................... 53
Analysis of Data .............................................................................................. 54 Phase 1: Factor Structure, Reliability, Validity, and Group Differences .. 55 Phase 2: Item Response Theory (IRT) Analysis ....................................... 57 Phase 3: Differential Item Functioning (DIF) Analysis ............................ 57
IV. RESULTS ...................................................................................................... 60
Phase 1: Factor Structure, Reliability, Validity, and Group differences ......... 62 Factor Structure ......................................................................................... 62 Internal Consistency Reliability ................................................................ 63 Criterion-Related Validity ......................................................................... 64 Group Differences ..................................................................................... 64
Phase 2: Item Response Theory (IRT) Analysis ............................................. 66 Phase 3: Differential Item Functioning (DIF) Analysis .................................. 71
DIF across Gender..................................................................................... 72 DIF across Ethnicity .................................................................................. 78 Item Parameter Drift ................................................................................. 90
V. DISCUSSION ................................................................................................. 94
The Psychometric Properties of the SDQI Mathematics Subscale ................. 94 The Classical Test Theory Perspective ..................................................... 94 The Item Response Theory Perspective .................................................... 97 Summary ................................................................................................... 99
Measurement Bias of the SDQI Mathematics Subscale across Gender ........ 100 Measurement Bias of the SDQI Mathematics Subscale across Ethnicity ..... 101 Item Parameter Drift of the SDQI Mathematics Subscale ............................ 104 Conclusion .................................................................................................... 105
BIBLIOGRAPHY ............................................................................................. 110
Texas Tech University, Tianlan Wei, August 2014
v
ABSTRACT In response to today’s high-tech global economy, there has been an increasing
need for professionals in the Science, Technology, Engineering, and Mathematics
(STEM) fields. To enhance students’ STEM performance and career aspirations,
researchers have suggested capturing their mathematics interest as early as elementary
school. Research on academic interest, however, has suffered from various conceptual,
theoretical, and measurement issues. In the literature, there are mixed findings
regarding gender, ethnic, and age differences in mathematics interest which has
entailed reexamination of the existing measures of interest. The purpose of this
dissertation study was to evaluate the psychometric properties of the Self Description
Questionnaire I (SDQI) with regard to children’s mathematics interest based on the
item response theory (IRT) framework.
The data for this study were drawn from the Early Childhood Longitudinal
Study, Kindergarten Class of 1998-99 (ECLS-K) with a sample of 14,631 children.
These children were assessed in third- and fifth-grade using the ECLS-K adapted
SDQI mathematics subscale, which consists of eight items with ordered response
options ranging from 1 to 4. The IRT-based evaluation suggests that the SDQI items
have sufficient discrimination, but insufficient item thresholds in assessing third- and
fifth-graders’ mathematics interest. Specifically, the IRT parameter estimates indicate
that 4-point scale of the SDQI items does not function much better than a dichotomous
scale, particularly among third-graders. Thus, it is recommended that the current
SDQI be reconsidered for its age appropriateness. In addition, affective items, which
involve enjoyment, liking, and positive emotions, are found to be better indicators of
interest than cognitive items that involve perceived competence, ability beliefs, or self-
efficacy.
In terms of measurement bias across gender, ethnic, and age groups, item-level
bias emerged in most of the items as assessed by the IRT-based Wald Test of
differential item functioning. In general, it was found that boys and White children
tend to endorse higher-order options of cognitive, perceived competence items to
represent their mathematics interest, while girls and ethnic minority children tend to
Texas Tech University, Tianlan Wei, August 2014
vi
endorse higher-order options of affective items. This affective-cognitive distinction of
measurement bias indicates that children may interpret and respond to self-report
interest measures differently as a function of their demographic background, resulting
in artificial item-level group differences as detected by the SDQI, although the scale
score can still be trusted in such group comparisons. Finally, results of the item
parameter drift analysis indicate that children become more sensitive to the response
anchors as they grow older, which reaffirms that the current SDQI items with the 4-
point scale performs poorly among young children. Taken together, findings of this
dissertation study provide insights into the conceptualization and operationalization of
mathematics interest and affect, as well as practical implications for future use, and
development of self-report measures in this field.
Keywords: mathematics interest, Self Description Questionnaire, item response theory,
measurement bias, gender difference, ethnic difference, item parameter drift
Texas Tech University, Tianlan Wei, August 2014
vii
LIST OF TABLES 3.1 The ECLS-K Adapted SDQI Mathematics Subscale. ............................... 43 4.1 Descriptive Statistics of the ECLS-K Variables Included in the
Study (N = 14,631) ........................................................................ 60 4.2 Bivariate Correlations among the SDQI Items in Third Grade ................. 61 4.3 Bivariate Correlations among the SDQI Items in Fifth Grade .................. 61 4.4 Internal Consistency Reliabilities and Factor Loadings by Item .............. 63 4.5 IRT Parameter Estimates (a, b1, b2, b3) of Third- and Fifth-
Grade Full Samples ....................................................................... 66 4.6 Summary of the DIF Results by Gender, Ethnicity, and Age ................... 72 4.7 Parameter Estimates of Multigroup (Male vs. Female) IRT in
Third Grade ................................................................................... 76 4.8 Parameter Estimates of Multigroup (Male vs. Female) IRT in
Fifth Grade .................................................................................... 75 4.9 Parameter Estimates of Multigroup (White vs. African
American, White vs. Hispanic, and White vs. Asian American) IRT in Third Grade ..................................................... 80
4.10 Parameter Estimates of Multigroup (White vs. African American, White vs. Hispanic, and White vs. Asian American) IRT in Fifth Grade....................................................... 85
4.11 Parameter Estimates of Item Parameter Drift Analysis ............................ 90
Texas Tech University, Tianlan Wei, August 2014
viii
LIST OF FIGURES 2.1 Three approaches to interest research ....................................................... 19 2.2 Model of triadic reciprocal causations ...................................................... 22 2.3 A social cognitive expectancy-value model of achievement
motivation ..................................................................................... 23 2.4 Overview of the general attributional model ............................................ 25 3.1 Option characteristic curves (OCCs) for a hypothetical item
with four response options (a = 1.7, b1 = -1.5, b2 = 0, b3 = 1.5) ............................................................................................. 49
3.2 Item information function (IIF) curve for a hypothetical item with four response options (a = 1.7, b1 = -1.5, b2 = 0, b3 = 1.5) ............................................................................................. 51
4.1 Group differences as assessed by the SDQI mathematics scale score .............................................................................................. 65
4.2 Option characteristic curves of the items in third grade ........................... 67 4.3 Option characteristic curves of the items in fifth grade ............................ 68 4.4 Item information function curves by item across grade levels ................. 70 4.5 Test information function curves by scale across grade levels ................. 71 4.6 Option characteristic curves of multigroup (male vs. female)
IRT in third grade .......................................................................... 75 4.7 Option characteristic curves of multigroup (male vs. female)
IRT in fifth grade .......................................................................... 77 4.8 Test characteristic function curves of multi-group (male vs.
female) IRT ................................................................................... 78 4.9 Option characteristic curves of multigroup (White vs. African
American) IRT in third grade ........................................................ 81 4.10 Option characteristic curves of multigroup (White vs. Hispanic)
IRT in third grade .......................................................................... 82 4.11 Option characteristic curves of multigroup (White vs. Asian
American) IRT in third grade ........................................................ 83 4.12 Option characteristic curves of multigroup (White vs. African
American) IRT in fifth grade ........................................................ 86 4.13 Option characteristic curves of multigroup (White vs. Hispanic)
IRT in fifth grade .......................................................................... 87 4.14 Option characteristic curves of multigroup (White vs. Asian
American) IRT in fifth grade ........................................................ 88
Texas Tech University, Tianlan Wei, August 2014
ix
4.15 Test characteristic function curves of multigroup (White vs. Asian, White vs. Hispanic, and White vs. Asian American) IRT .............................................................................. 89
4.16 Option characteristic curves of item parameter drift (third- vs. fifth-grade) .................................................................................... 92
4.17 Test characteristic function curves of item parameter drift ...................... 93
Texas Tech University, Tianlan Wei, August 2014
1
CHAPTER I
INTRODUCTION
Context of the Study
Science, Technology, Engineering, and Mathematics (STEM) represent
intellectual and cultural achievements, which constitute fundamental aspects of the
contemporary society. In response to today’s high-tech global economy, there has
been an increasing need for professionals in the STEM fields. The 2008-2018
occupational employment projections (Lacey & Wright, 2009) suggested that, along
with changes in the U.S. demographics and the increasingly competitive business
environment, computer and mathematical science occupations would see employment
growth by over 20% by 2018. Moreover, STEM has also been widely acknowledged
as a powerful engine of continued scientific leadership and prosperity of the United
States (National Research Council [NRC], 2011). Therefore, it concerns both policy
makers and employers that K-12 education in the U.S. does not seem to prepare
students adequately for the pursuit of STEM degree programs and professions. In fact,
it was reported that international students accounted for over one-third of the students
in U.S. science and engineering schools (National Academy of Sciences, National
Academy of Engineering, & Institute of Medicine, 2007), and the STEM workforce
had largely depended on these foreign-born mathematicians, scientists, and engineers
(Sanders, 2004).
Two major issues emerged as pertinent to K-12 STEM education in the U.S.
First, it appears that U.S. students’ STEM performance, particularly that of
mathematics, is less than satisfactory considering current and future economic
demands. For example, the data collected by the National Assessment of Educational
Progress (NAEP) over the 1996-2009 period indicated that about 75% of U.S. students
were not proficient in mathematics when they completed eighth grade (Schmidt, 2011).
Furthermore, whereas the NAEP data showed that the percentage of students who
were proficient in mathematics increased from 19% to 33% for fourth graders, and
Texas Tech University, Tianlan Wei, August 2014
2
from 20% to 26% for eighth graders, the increase was merely 2% for 12th graders (21%
to 23%), indicating a decline in students’ mathematics performance as they advanced
to high school. Recent reports regarding international assessments also indicated that
U.S. students had routinely fallen behind higher performing educational systems such
as Singapore and Hong Kong in science and mathematics performance. For example,
only 13% of U.S. fourth graders met the Trends in International Mathematics and
Science Study (TIMSS) international benchmark of advanced in mathematics, as
compared with 43% for Singapore and 39% for South Korea (Provasnik et al., 2012).
The statistics were even lower for U.S. eighth graders; only 7% of students met the
advanced benchmark as compared with 49% for Chinese Taipei and 48% for
Singapore. Other education systems such as Russia (14%), Australia (9%), and
England (8%) also outperformed the U.S. The 2009 Program for International Student
Assessment (PISA) data were no more impressive in terms of U.S. students’
mathematics literacy: the average score of U.S. 15-year-old students (487) was lower
than the Organization for Economic Cooperation and Development’s (OECD) average
score of 496, and surpassed by those of 17 other OECD countries: 541 in Finland, 527
in Canada, and 514 in Australia (Fleischman, Hopstock, Pelczar, & Shelley, 2010).
Together, these data indicate that U.S. students’ mathematics and science performance
declines as they advance to higher grades, either as measured by NAEP standards, or
within the context of international comparisons.
The second issue of STEM education in the U.S. relates to students’ career
aspirations. The diminished performance may be a function of how students’ career
aspirations in STEM fields are shaped. The decrease in the supply of college-educated
STEM graduates had caused concerns in most western countries (van Langen &
Dekkers, 2005). The proportion of postsecondary graduates in STEM fields was only
12% in the U.S. (OECD, 2003), and as documented by the National Science Board
(NSB; 2010), the overall percentage of STEM degrees awarded has further decreased
since. The NSB (2010) statistics showed that, although nearly one-fourth of all U.S.
undergraduates began their studies in STEM majors, only one-half of them would
Texas Tech University, Tianlan Wei, August 2014
3
graduate with a STEM degree. Researchers (e.g., Hossain & Robinson, 2012;
Korpershoek, Kuyper, Bosker, & van der Werf, 2013) argued that many students
might fail to realize their potential, and leave the STEM track before or in college due
to a lack of proper motivation.
With regards to motivation, women’s academic performances and career
aspirations in STEM fields warrant particular attention. Although the gender gap in
mathematics achievement seemed to close up slowly over the past two decades (Else-
Quest, Hyde, & Linn, 2010), women have continued to be underrepresented in the
STEM fields and leadership positions in the U.S. For example, recent data revealed
that women accounted for only 26% of the college-educated workforce in science and
engineering (NSB, 2010), and only 30% of STEM deans and department heads
(National Science Foundation [NSF], National Center for Science and Engineering
Statistics [NCSES], 2010). The NSF (2013) report further indicated that women’s
shares of postsecondary degrees in physical sciences and mathematics remained well
below those of men and that, despite the rise of full-time, full professorships held by
women over the 1993-2010 period, women represented less than one-fourth of all full-
time, full professors in the STEM fields.
Based on the above evidence, the NRC (2011) set up the primary goal for
effective K-12 STEM education as expanding the number of students who pursue
advanced STEM degrees and careers and broadening the participation of women and
minorities in these fields. To achieve this goal, educational research has sought to
identify factors that serve to maintain high levels of motivation in STEM subjects
among students. One of the major individual-level motivational factors is academic
interest (Schunk, Pintrich, & Meece, 2008). To enhance students’ performance and
intention to pursue careers in the STEM fields, researchers have suggested capturing a
student’s interest as early as elementary school (DeJarnette, 2012; Russell, Hancock,
& McCullough, 2007). This suggestion was based on the well-documented positive
associations between interest and subsequent learning (e.g., Hidi, 2000; Hidi &
Harackiewicz, 2000; Schiefele, 1991; Schiefele, Krapp, & Winteler, 1992). This
Texas Tech University, Tianlan Wei, August 2014
4
dissertation was, therefore, intended to address issues regarding the conceptualization
and measurement of interest as related to affective experiences within the domain of
mathematics education.
Statement of the Problem Although often conducted within the framework of achievement motivation,
research on academic interest and affect has yet to provide a widely acknowledged
theory to account for how interest influences motivation and achievement. Interest,
along with associated affective experiences, is evident in the theoretical models of
influential motivational theories such as the expectancy-value theory (Eccles et al.,
1983) and the attribution theory (Weiner, 1986, 1992), but not of as great importance
as constructs such as self-efficacy (Bandura, 1997), expectancy and task values
(Eccles et al., 1983), or locus of control (Weiner, 1986). For example, as compared
with self-efficacy, which had received much attention since the 1980s for its predictive
power over other motivational factors (Klassen & Usher, 2010), research on academic
interest appeared to lack a guiding theory, and had not advanced much over the past
two decades (Schunk et al., 2008). The relative paucity of research in this field may
be examined from several perspectives involving conceptual, theoretical, and specific
measurement issues. These perspectives, however, are not mutually exclusive.
Theoretical Issues Interest and affect were barely studied in behavioral theories of motivation,
which had dominated educational research until the 1960s. In the view of behavioral
theorists (e.g., Skinner, 1953), motivation is evidenced by a change in the rate,
frequency of occurrence, or form of behaviors in response to environmental stimuli.
In other words, students are mostly motivated by environmental events, and the focus
of behavioral research is to examine the connection between hypothesized
environmental stimuli and observable changes in behaviors. As such, behavioral
theorists contended that research on motivation does not need to include thoughts and
feelings (Schunk et al., 2008). In behaviorism, stimulus and response are directly
observable, but interest and affect are usually not. This lack of direct observation
Texas Tech University, Tianlan Wei, August 2014
5
alludes to the dilemma of measuring unobservable constructs regarding people’s
thoughts and feelings as a part of scientific research where there are clearly more
sources of errors in such measures (Weisberg, 2005).
Along with the emergence of social cognitive theory (Bandura, 1977, 1986),
cognitive theories of motivation largely replaced behavioral theories in guiding
research. Essentially, the social cognitive perspective of motivation added a cognitive
component to learning and motivation, and is thus more concerned about the learner’s
cognitive process. Bandura (1977) analyzed human learning and self-regulation in
terms of triadic reciprocal causations, which involve three types of determinants of
learning: personal determinants, behavioral determinants, and environmental
determinants. In other words, the cognitive perspective postulates that, in addition to
merely responding to environmental stimuli, individuals also actively process
information acquired from the environment, resulting in varying outcomes. The
inclusion of a cognitive component of motivation helps explain a large proportion of
individual variance unaccounted for by behaviorism studies.
Nonetheless, the role of interest and affect remains unclear in such cognitive
processes. The cognitive theories of motivation generally assume that people are
naïve scientists who try to understand their environments and causal determinants of
their past behaviors (Schunk et al., 2008), but this assumption may sometimes be
violated because humans do not always rely on their rationality in making judgments
and decisions. In fact, the commonly-held view that humans base their decisional
processes entirely on rationality has been increasingly challenged in social sciences
since the late 1970s (Denes-Raj & Epstein, 1994). Some theorists argued that people
process information by two parallel processing systems: a rational system which
functions based on established rules of logic and evidence and an experiential system
which functions based on the intuitive experience of affect from more concrete
exemplars and schemas (Epstein, 1990). Adopting this perspective, most cognitive
theories of motivation appear to be unbalanced in that they place greater emphasis on
the rational system, and may underestimate and underplay the roles of learners’
Texas Tech University, Tianlan Wei, August 2014
6
affective experiences. Hence, neither behavioral nor cognitive theories of motivation
have effectively addressed the role of interest and affective experiences in motivation,
and the lack of guiding theories may partly explain the relatively small amount of
research in this field.
Conceptual Issues Research on interest and affect in motivation have suffered from unclear,
conflicted conceptualizations. In terms of interest, motivational psychologists,
developmental psychologists, and educational psychologists have provided at least
three perspectives on interest: personal interest as an individual disposition,
interestingness as an aspect of the context, and interest as a psychological state (Krapp,
Hidi, & Renninger, 1992; Schunk et al., 2008). Among them, personal interest
reflects a relatively stable disposition; interestingness primarily concerns the context
(e.g., interestingness of an activity or task); while situational interest is viewed as a
psychological state. The three perspectives are not mutually exclusive, given that
situational interest may be triggered and maintained by interestingness as a contextual
factor (Hidi, 2000; Schunk et al., 2008).
Both personal interest and situational interest refer to a psychological state of
being interested, but they vary in levels of relation to stored knowledge and value
(Renninger & Hidi, 2002). In particular, situational interest may require little prior
knowledge, and is not necessarily associated with positive value; whereas personal
interest indicates that the individual has both stored knowledge, and positive value for
the object of interest. Renninger and Hidi (2002) further contend that it is optimal for
situational interest to evolve into personal interest through the student’s interaction
with his environment because personal interest is more of a well-developed interest.
This personal interest then helps maintain and deepen the student’s motivation in spite
of adverse situations. To summarize the three perspectives on interest in this view,
situational interest is triggered by interestingness of the academic task, and may
further evolve into personal interest depending on opportunities and support available
to the student (Hidi, 1990; Krapp, 1999).
Texas Tech University, Tianlan Wei, August 2014
7
Such evolution of situational interest into personal interest is certainly of great
importance to educational researchers. Research on this process may provide valuable
implications for effective learning and instruction. However, challenges also emerge
with respect to the conceptualization and measurement of these closely related
constructs. First and foremost, it seems difficult to distinguish interest from affect
when interest is viewed as being “aroused or activated as a function of interestingness
of the context” (Schunk et al., 2008, p. 213), especially in a review of the broad
definition of affect, which encompasses nearly all subjectively experienced feelings
including arousal, emotions, and moods (Blechman, 1990). Renninger’s model (e.g.,
Renninger, 1990, 1992), which attempted to account for interest by learners’ prior
knowledge and subjective value of the task, has received criticism that, as opposed to
her hypothesis, it seems intuitively possible for individuals with low prior knowledge
to be highly interested in particular activities. Thus, this model may capture personal
interest fairly well, but seems incapable of accounting for the formation of situational
interest. Moreover, very few quantitative measures are available for distinguishing the
two types of interest, let alone examining how situational interest may develop into
personal interest.
Although emotions are viewed as affect without much skepticism, there are
also a few conceptual issues with emotions. Emotions are often confused with other
affective phenomena such as moods and sentiments. Frijda (1994) suggested two
major distinctions between emotion and other affective phenomena: whether this
affective state is object-directed, and whether it can be viewed as an enduring
disposition. Emotions are object-directed and fairly short-lived as compared with
moods or sentiments. More importantly, emotions are normally unrelated to enduring
dispositions. A student feeling anxious about an upcoming test is not necessarily
prone to anxiety by nature. Sentiments, however, are used to describe dispositions
that people possess “to respond affectively to particular objects or kinds of event”
(Frijda, 1994, p. 64). When a student says he dislikes mathematics, such “dislike”
does not represent a short-lived, momentary response to mathematics; rather, it reflects
Texas Tech University, Tianlan Wei, August 2014
8
an affective disposition that is relatively stable. Most sentiments are acquired based
on previous experience or social learning (Frijda, 1994). For instance, a student may
have failed mathematics tests so many times that he has developed a negative
sentiment about mathematics out of negative feelings accompanying each failure.
The conceptual issues may complicate the investigations of emotions in
academic settings when researchers have different working definitions for emotion.
Some (e.g., Forgas, 2000) may define emotions (e.g., test anxiety) as short-lived,
object-directed phenomena, and others’ conceptualizations may be closer to moods or
sentiments as presented by Frijda (1994). The bulk of educational research on
emotions derived from Pekrun and colleagues based on Pekrun’s (1992) taxonomy of
academic emotions. Some emotions (e.g., retrospective emotions) in this taxonomy
seem to be more general and enduring than others (e.g., process-related emotions),
which cautions researchers about the demand of accurate measures. Hence, both the
theoretical and conceptual issues discussed above have led to specific measurement
issues in this field of research.
Measurement Issues Accurate measurement is the cornerstone of empirical research. The
overarching validity of a study is built on the reliability and validity of its
measurement(s). All types of measurements have their strengths and weaknesses, but
measurement issues particularly persist in the use of self-report instruments to elicit
responses of one’s affective state. Self-report measurements can be defined as
instruments used for collecting numerical scores from which inferences can be made
(Gall, Gall, & Borg, 2006). Self-report methods, although the most efficient and
easiest way for latent constructs to be inferred, largely rely on the assumption that
respondents are able to observe and report on their own affective states. In other
words, self-report measures require certain levels of metacognition and self-awareness
(Schunk et al., 2008), which may not be sufficiently high in children attending
elementary schools (Flavell, 1987).
Texas Tech University, Tianlan Wei, August 2014
9
The test validity and reliability of measurements of interest have been plagued
with the theoretical and conceptual issues discussed earlier. Test validity is defined as
the “degree to which evidence and theory support the interpretation of test scores
entailed by proposed uses of tests” (American Educational Research Association,
American Psychological Association, & National Council on Measurement in
Education, 1999, p. 9). In other words, an instrument is considered to be valid when it
measures the construct(s) it is intended to. Because researchers have yet to reach an
agreement regarding the psychological nature of interest (i.e., cognitive vs. affective),
it is difficult to ascertain the general validity of an instrument of interest.
Test reliability is sometimes conceptualized as the degree to which an
instrument measures whatever it is measuring (Gay, Mills, & Airasian, 2008), but it
essentially refers to the degree to which observed scores reflect the true score of the
construct being measured (Gall et al., 2006; Larsen & Fredrickson, 1999). Although
educational researchers emphasize the role of test reliability as the foundation for all
following inferential work in empirical studies, it is not uncommon for researchers of
emotions to rely on single-item tests for measuring emotions. These researchers
contended that emotion “is more typically construed as within-subject construct (a
state), and we assume that it may change quickly and frequently within any single
individual” (Larsen & Fredrickson, 1999, p. 43). This may lead to the uncertainty in
using classical test theory (CTT) methods (e.g., reliability coefficient) to assess the
quality of a measurement when it is possible to bypass reliability concerns and focus
on the validity instead.
This complicates the goal of including more women and ethnic minorities in
advanced STEM courses because some measures may not be fair. A test is not fair
when two groups of equal levels of a latent trait (e.g., interest) earn different scores on
the same item of the test (Gall et al., 2006). This measurement bias is also termed as
differential item functioning (DIF; Teresi, 2006). With regard to measures of interest,
Schunk et al. (20008) suggested that it might be particularly difficult for some
subgroups to accurately report their affects. Before drawing any conclusions
Texas Tech University, Tianlan Wei, August 2014
10
regarding gender or ethnic differences in attitude, interest, or affect, researchers need
to be aware of these possible measurement biases. Although research has indicated
gender differences in mathematics interest (e.g., Evans, Schweingruber, & Stevenson,
2002; Köller, Baumert, & Schnabel, 2001; Marsh, Trautwein, Lüdtke, Köller, &
Baumert, 2005; Renninger, 1992), it is possible that such differences primarily came
from item- and scale-level measurement biases rather than real differences in latent
traits. Moreover, cultural differences have been a frequently studied source of
response bias, as the language of the instrument may be a potential threat to
measurement invariance, particularly for multilingual respondents (Church, 2001;
Ramirez, Teresi, Holmes, Gurland, & Lantigua, 2006).
All measurement issues discussed above call for the utilization of the item
response theory (IRT) in research on mathematics interest and affect. IRT bases its
analysis on some unique assumptions, which allow it to overcome many shortcomings
of the CTT methods (Gall et al., 2006). It is particularly helpful in examining
instruments for which reliability concerns have to be bypassed. Further, unlike the
CTT perspective which assumes the same amount of error for each respondent, IRT
analyses are capable of examining the unique item-respondent interaction that an item
may be identified as being too easy (very low thresholds) or too difficult (very high
thresholds) for a particular respondent. Using IRT, researchers are not only evaluating
the psychometric quality of an instrument in scale- and subscale-levels, but also its
item-level characteristics.
Purpose of the Study The purpose of this dissertation was to evaluate the psychometric properties of
the measure of students’ mathematics interest. The study comprises three phases in
which Self Description Questionnaire I (SDQI; Marsh, 1992a) data from the Early
Child Longitudinal Study-Kindergarten (ECLS-K) were utilized. In the first phase,
the measure was evaluated from the CTT and factor analytic perspectives for
properties to demonstrate factor structure and internal consistency reliabilities. The
association between mathematics interest and mathematics performance was examined
Texas Tech University, Tianlan Wei, August 2014
11
to provide evidence of the external validity. In the second phase, the item-level
characteristics were examined within the IRT framework (discussed in Chapter 3). In
the third phase, measurement biases (DIF) across gender, ethnicity, and age were
examined within the IRT framework.
Research Questions To achieve the purpose of this dissertation, the following research questions
were outlined:
1. From both the CTT and IRT frameworks, does the SDQI accurately measure
children’s interest and affect in mathematics?
2. Do items of mathematics interest and affect demonstrate measurement bias
across gender?
3. Do items of mathematics interest and affect demonstrate measurement bias
across ethnic groups?
4. Do items of mathematics interest and affect demonstrate measurement bias
across age groups?
5. How do children’s responses to the mathematics interest and affect items
change over time?
Significance of the Study In the literature, two explanations are given for unsatisfactory academic
performance: lack of ability and lack of effort or motivation. Because there is little
that researchers and educators can do about students’ abilities, much effort has been
invested on boosting their academic motivation (Hidi & Harackiewicz, 2000).
However, the unique features of the STEM subjects appear to daunt students from an
early age. As reported by the President’s Council of Advisor on Science and
Technology (2010), many U.S. students and their parents believe that STEM subjects
are “inherently boring, cryptic, or beyond their grasp” (p. 57). As such, it is even
more critical to identify and maintain students’ interest in STEM subjects. Along with
Texas Tech University, Tianlan Wei, August 2014
12
the facts that U.S. students’ STEM achievements decline over age (Fleischman et al.,
2010; Provasnik et al., 2012; Schmidt, 2011), research has shown that children’s
interests and attitudes toward specific subject areas such as mathematics tend to
deteriorate as they get older (e.g., Eccles, Wigfield, & Schiefele, 1998). Moreover, the
wide disparities of STEM performance across gender and ethnic groups (NRC, 2011)
also entail in-depth examinations of the measures that researchers use for gauging
interest, attitude, and affect in STEM education.
This dissertation provides a comprehensive examination of a commonly used
measure of mathematics interest and affect—the Self Description Questionnaire I
(SDQI). The SDQI was designed to measure multiple dimensions of self-concept for
pre-adolescents in primarily four non-academic areas (i.e., physical ability, physical
appearance, peer relations, parent relations), and three academic areas (i.e., reading,
mathematics, and school in general; Marsh, 1992a). From a review of extant literature,
the SDQ has been cited in over 450 studies from 2003 to 2013. Thus, a psychometric
examination of this measure with respect to mathematics interest and affect was
particularly warranted. As discussed earlier, the conceptual issues particularly relate
to the face validity of individual items designed to tap interest and affect in academic
settings. For example, with respect to the distinction between personal interest and
situational interest (Schunk et al., 2008), the SDQI item “I enjoy doing work in math”
seems to be more concerned about situational interest, given that it specifies a context
for this interest to be triggered and observed, while a less straightforward item “I
cannot wait to do math each day” seems to be more of a measure of personal interest
as a dispositional trait. We may therefore expect to see varying levels of item
discrimination and threshold. Thus, the IRT analyses of these item-level
characteristics are promising in revealing such operationalization issues that may
interfere with the measurement. Moreover, findings of this study would also
contribute to the knowledge base for future construction and validation of instruments
of academic interest.
Texas Tech University, Tianlan Wei, August 2014
13
In addition to examining the psychometric properties of the instrument, the
current study examines relevant differences in mathematics interest and affect
according to group membership (e.g., gender and ethnic group), and across time.
Although gender, ethnic, and age differences are well-documented in the literature
(e.g., Else-Quest et al., 2010), few studies have been conducted for examining the
measurement invariance of these measures. Such examinations of pertinent
instruments are considered critical because they would likely confirm or disconfirm
previous findings regarding gender, ethnic, and age differences as assessed by the
SDQI and similar measures. More importantly, findings with respect to measurement
bias in SDQI were expected to provide practical implications as to how strategies (e.g.,
replacement of items) might be implemented for fairer measures across demographic
groups (Teresi, 2006).
Limitations of the Study The current study is limited in that it focuses on just one measurement of
mathematics interest and affect, the SDQI, and there may be other measures of
mathematics interest that would warrant investigation as well. However, IRT analyses
require large sample sizes (e.g., a sample size of 500 or more is preferable; see Reise
& Yu, 1990; Herrera & Gomez, 2008). Thus, the analysis of other measures was
precluded given the availability of appropriate data to conduct these analyses. To
offset this limitation, the external validity of the measure was examined with respect
to actual achievement to ensure that the measure might be considered authentically
related to achievement.
There are also concerns regarding the face validity and factor structure of
relevant SCQI items within the context of the current study. The SCQI were
hypothesized to measure multiple dimensions of pre-adolescence children’s self-
concept (Marsh, 1992a), in which student ratings of their skills, ability, enjoyment and
interest in mathematics altogether represent one out of the seven unique dimensions
(i.e., physical abilities, physical appearance, reading, mathematics, peer relations,
parent relations, general-self, and general-school). Although items measuring
Texas Tech University, Tianlan Wei, August 2014
14
mathematics interest are distinctive in terms of how they are phrased, they were not
distinguishable from items of perceived competence of mathematics as a result of
factor analysis (Marsh & O’Neill, 1984). A possible explanation may be the positive
association between perceived competence and intrinsic pleasure in Harter’s (1981,
1996) model of mastery motivation in children, but it also points out that interest is
somewhat elusive to capture with the presence of related constructs.
Additionally, in examining differences in measurement across time, the
mathematics interest and affect were only measured across two time points (i.e., third-
and fifth-grade). Even though the two time points represent a critical stage in
children’s development of interest, attitude, and affect (Eccles et al., 1998; Wigfield,
Eccles, Schiefele, Roeser, & Davis-Kean, 2006), the data only allowed for the
assessment of the amount of change, but not the pattern of this change. Though
adopting a longitudinal perspective, this study is not strictly longitudinal in nature,
given that any two-wave observation is incapable of providing estimates of the
parameters of the growth curve (Rogosa, 1995).
Texas Tech University, Tianlan Wei, August 2014
15
CHAPTER II
REVIEW OF LITERATURE The review of literature is focused on the theoretical perspectives and research
findings on interest with respect to theories of achievement motivation because the
significance of this dissertation lies in the consistent findings that interest is positively
correlated with motivation and performance in academic settings (Schunk et al., 2008).
It begins with a general review of the concept of interest, discusses the role of interest
and related affective experiences in major motivational theories, and introduces the
theoretical basis for investigating gender, ethnic, and developmental differences in
mathematics interest.
The Concept of Interest The topic of interest has a long history that can be traced back to the 1800s,
when Herbart (1806) contended that interest could promote motivation in learning.
The beginning of the 20th century witnessed a surge of scholarly discussions of
interest among theorists. Dewey (1913) believed that an individual interacts with the
environment to raise interest; Thorndike (1935) considered learning to be affected by
the learner’s interest; and Bartlett (1932) added that interest might specifically
facilitate the learner’s memory. Following this surge, however, research on interest
declined when behaviorism became dominant in psychology (Schunk et al., 2008).
Krapp et al. (1992) provided two reasons for this decline: the many and varied
conceptualizations of interest and the development of more discrete research
approaches, which had rendered the concept of interest superfluous. They further
argued that each of these approaches typically focused on a single aspect of interest
such as attention, curiosity, emotion, attitude, and motivation. The situation of
research on interest did not change much when cognitive psychology emerged to
dominate psychological research because the early cognitive theories largely neglected
the motivational processes. It was not until the late 1980s and early 1990s that we saw
a renaissance of the concept of interest as researchers began to integrate motivational
and cognitive variables to better explain learning and performance (Krapp et al., 1992;
Texas Tech University, Tianlan Wei, August 2014
16
Schunk et al., 2008). That being said, these research endeavors (e.g., Hidi, Baird, &
Hildyard, 1982; Renninger, 1984, 1989, 1990; Schiefele, Winteler, & Krapp, 1988)
were still based on varied conceptualizations of interest, thus representing different
perspectives and questions about learning. This may explain why research on interest
again has waned in the past two decades.
It appears that the difficulty in such conceptualization lies in the fact that
interest is itself a concept invented and used in everyday life to refer to a certain
psychological phenomenon. When a psychologist wants to investigate a person’s
interest, he will have to assume that the person uses the term “interest” in its more
general shared meaning. It is through this assumption that we created the
psychological construct “interest,” but Valsiner (1992) cautioned that “such
operationalization remains a methodological construction that leads to data derivation
from the phenomena in ways that by their nature eliminate a relevant aspect of the
phenomena from the data” (p. 29). Such confusion is evident in the use of any
common-language meaning as a psychological construct, resulting in a few issues as
pertinent to the conceptualization as well as operationalization of interest. First,
interest is so commonly used in everyday language that researchers may find it
unnecessary or even impossible to assign a specific definition to it. As it turned out,
however, their operationalization exhibited very different patterns and perspectives of
what interest constitutes. Second, disparities exist between interest in its common-
language meaning and interest as a psychological construct. Thus, any self-report
measures of interest may demonstrate a unique validity issue because the assumption
that a respondent interprets the term “interest” in its more general meaning appears
fairly weak.
Personal Interest and Situational Interest Despite the varied conceptualizations, most theorists agree that interest is a
phenomenon that emerges from an individual’s interaction with the environment
(Krapp et al., 1992). With respect to the person-environment interaction, there have
been two related bodies of research on interest. The first body of research
Texas Tech University, Tianlan Wei, August 2014
17
concentrates on personal interest, also referred to as individual interest or trait interest,
by analyzing its origins and effects, particularly the effect on learners’ cognitive
performance. The second body of research concentrates on the specific characteristics
of the environment that captures the interest of many learners regardless of their
personal interests. Such situational interest is termed as state interest in other
literature. In summarizing the two bodies of research, Krapp et al. (1992) stated that
“interest as a psychological state, and situation-specific factors that bring about
interest, then, reflect two distinct research approaches for investigating the role of
interest in learning and development” (p. 6). However, this statement appears to be
inaccurate along with the development of interest research in the past two decades.
First of all, some theorists contended that there are three, rather than two,
perspectives of interest: personal interest, situational interest, and interestingness of
the context (Schunk et al., 2008). Krapp et al. (1992) did not distinguish
interestingness of the task from situational interest, which made the investigations of
situational interest less person-oriented. More recent literature (e.g., Renninger &
Hidi, 2002) suggested that both personal interest and situational interest refer to a
psychological state of being interested, and situational interest may translate into
personal interest given certain conditions. Hence, we may draw a distinction between
investigations of person-oriented interest (i.e., personal interest and situational interest)
and investigations of interestingness as a contextual factor, but personal interest and
situational interest actually represent two research areas that are closely related.
Personal interest is considered to be specific to individuals, relatively stable,
and usually associated with positive emotions and values (Krapp et al., 1992). In
modern psychology, various theoretical perspectives are available for investigating
personal interest. In social psychology, for example, interest has been viewed as a
vocationally relevant disposition closely related to the concept of attitude, and
sometimes even defined as attitude (Evans, 1971). On the other hand, process-
oriented theories are more concerned about students’ interest as demonstrated by the
Texas Tech University, Tianlan Wei, August 2014
18
levels of psychological arousal such as effortless attention accompanied by positive
affect (Krapp et al., 1992).
In contrast to personal interest, situational interest refers to interest that is
generated primarily by certain conditions or environmental stimuli. From this
perspective, situational interest is not specific to the individual, but tends to be
common across individuals (Krapp et al., 1992). Because there is large variability in
the person-environment interactions in raising and maintaining interest, however, it is
not likely that individuals would experience the same level of arousal given the same
external stimulus. Due to this nature of it, situational interest is unstable and fairly
short-lived as compared with personal interest. To further distinguish situational
interest from personal interest, theorists also provided a perspective regarding the
affective component of interest. While personal interest is usually accompanied by
positive feelings, situational interest may not be as consistently associated with such
positive affect (Iran-Nejad, 1987).
Interest, Affect, and Motivation Figure 2.1 illustrates the relations among the three approaches to interest
research. This structure clearly coincides with the social cognitive theory (discussed
in detail in “Interest and Affect in Motivation”) that it views interest as a personal
factor (i.e., personal interest as a disposition), an environmental factor (i.e.,
interestingness), and a behavioral factor (i.e., psychological arousal through person-
environment interaction). As such, it is appropriate to adopt the social cognitive
perspective in relevant investigations.
Previous studies in different psychological domains (e.g., social psychology,
applied psychology) have examined interest as, or related to constructs including
attention, curiosity, emotion, attitude, and motivation. With its common-meaning in
everyday language, the term “interest” may inevitably give rise to its many and varied
conceptualizations. It is therefore critical for a particular interest study to be specific
about its guiding theoretical perspective. Given the importance of motivation in
Texas Tech University, Tianlan Wei, August 2014
19
academic settings, particularly with regard to the underrepresented populations in
STEM fields, this dissertation undertakes the motivational perspective to view interest
and relevant affective experiences as contributing factors of achievement motivation.
Characteristics of the person Psychological state Characteristics of the within the person
learning environment (material/text)
Figure 2.1. Three approaches to interest research. Adapted from The Role of Interest
in Learning and Development (p. 10), by K. A. Renninger, S. Hidi, and A. Krapp
(Eds.), 1992, Hillsdale, NJ: Erlbaum. Copyright 1992 by Lawrence Erlbaum, Inc.
To further complicate the situation of interest research, it is worth noting that
the literature did not provide a clear distinction between interest and affect. According
to the broad definition of affect which involves all subjectively experienced feelings
including arousal, emotions, and moods (Blechman, 1990), it is reasonable to argue
that interest, particularly situational interest, is a certain type of affective experience.
Although a single aspect of interest may resemble other constructs such as attitude
(e.g., Evans, 1971), attention (e.g., Eysenck, 1982), or emotion (e.g., Izard, 1977), the
status of interest as any of these affective constructs is debated. What we are more
certain about is that interest, as a psychological state, tends to be accompanied with a
variety of taxonomies of affect such as physiological changes (Silvia, 2008), cognitive
appraisal (Silvia, 2005) and motivational value (Izard & Ackerman, 2000). The
affective component, however, appears to be more intense and consistent in personal
Personal interest as a disposition
Interestingness
Actualized individual personal interest
Situational interest
Texas Tech University, Tianlan Wei, August 2014
20
interest than in situational interest (Krapp et al, 1992). Given the complicated
relations between affective experiences and the concept of interest, affect was
included as one of the peripheral factors of interest under the investigation of this
dissertation.
Interest and Affect in Theories of Achievement Motivation Motivation is defined as “the process whereby goal-directed activity is
instigated and sustained” (Schunk et al., 2008, p. 4). In academic settings, motivation
can influence what, when, and how we learn (Schunk, 1995). Motivation is a
multifaceted construct, and researchers usually gauge it through its behavioral
indicators. Researchers in general agree on the following four behavioral indicators of
motivation: choice of task, effort, persistence, and performance (Schunk et al., 2008).
In fact, the first three may be viewed as the mechanism through which motivation
affects a learner’s learning outcomes or performance. For example, a motivated
learner is more likely to choose challenging tasks (e.g., taking advanced math courses
which are optional), invest more time and effort on the tasks (e.g., studying advanced
mathematics for extensive time, actively seeking help), and persist in working on the
tasks when encountering difficulties (e.g., not quitting the course when failing an
exam) than an unmotivated peer. These behaviors will likely lead to better outcomes
or performance. Because of this, motivation plays a vital role in learning and has
received considerable attention in educational research.
The linkage between achievement motivation and positive learning outcomes
may be explained from either a behavioral or cognitive perspective of learning.
Behaviorism (e.g., Skinner, 1953) views learning as a change of frequency in
behaviors through the A-B-C (Antecedent-Behavior-Consequence) operant
conditioning process, and the foresaid behaviors (e.g., choosing challenging tasks,
investing efforts) would help a learner acquire desirable consequences, which further
reinforce him to learn. Thereby, motivated individuals enhance their learning
experiences by strengthening the stimuli-response (S-R) connection. On the other
hand, cognitivists, particularly constructivists (e.g., Piaget, 1985) contend that learning
Texas Tech University, Tianlan Wei, August 2014
21
occurs when a learner actively “construct” knowledge, and such construction is
activated through an adaptation process—assimilation or accommodation (Piaget,
1985). As such, a challenging task is more likely to trigger the adaptation process and
facilitate the construction of knowledge. Finally, social cognitive theorists (e.g.,
Bandura, 1977; Vygotsky, 1978) stress the importance of the interaction between
individuals and their social context. The choice of challenging tasks and help-seeking
behaviors of a motivated learner indicate the active role of learners in organizing
personal and environmental resources in order to produce positive outcomes.
In mentioning the active role of learners, the contribution of social cognitive
theory must be acknowledged. Social cognitive theory (Bandura, 1977, 1986, 1997)
has made an enormous influence in psychology and education over the past three
decades (Zimmerman & Schunk, 2003). In general, Bandura (1977) analyzed human
learning and self-regulation in terms of triadic reciprocal causations, which involve
three types of determinants (i.e., personal, behavioral, and environmental) of learning
(see Figure 2.2). Social cognitive theory “depicted people as self-organizing,
proactive, self-reflective, and self-regulative rather than merely reactive to social
environmental or inner forces” (Zimmerman & Schunk, 2003, p. 439). In comparison,
while behavioral theories of motivation focus on the lower half of the triadic
reciprocality model, which is, the connection between environmental (i.e., stimuli) and
behavioral determinants (i.e., response), the social cognitive perspective of motivation
directs researchers’ attention at the (within) personal determinants including thoughts
and feelings. As Bandura (1986) stated, “what people think, believe, and feel affects
how they behave. The natural and extrinsic effects of their actions, in turn, partly
determine their thought patterns and affective reactions” (p. 25).
The social cognitive perspective of motivation allows researchers to uncover
the cognitive and affective factors, which may explain the individual variance
unaccounted for by merely looking at environmental factors such as classroom setting
and instructional methods. As an example, two classmates may be receiving exactly
the same instruction in the same classroom environment, yet demonstrate varying
Texas Tech University, Tianlan Wei, August 2014
22
levels of motivation, because they may differ greatly in how they perceive the
environment, make attributions of their successes and failures, and affectively respond
to certain events. Bandura’s (1986) statement that, “what people think, believe, and
feel affects how they behave” (p. 25) concerns at least two domains of personal
determinants in the triadic reciprocality model: the cognitive domain and the affective
domain. A review of the major theories of motivation to-date, however, reflected an
unbalanced weight on the cognitive domain over the affective domain.
Figure 2.2. Model of triadic reciprocal causations. From Social Foundations of
Thought and Action: A Social Cognitive Theory (p. 24), by A. Bandura, 1986,
Englewood Cliffs, NJ: Prentice Hall. Copyright 1986 by Prentice-Hall Inc.
Expectancy-Value Theory of Motivation The expectancy-value theory of motivation has been one the most prominent
views on the nature of achievement motivation. Its application in educational research
mainly draws from the work of Wigfield, Eccles, and their colleagues (e.g., Eccles et
al., 1983; Wigfield, 1994; Wigfield & Eccles, 2000, 2002; Wigfield et al., 2006).
From a social cognitive perspective, motivation is affected by a variety of factors such
as reinforcement of behavior, learners’ goals, interests, and sense of self-efficacy and
self-determination. In the social cognitive expectancy-value model (see Figure 2.3),
these factors are organized to create two general sources of motivation: learners’
expectation on how well they will do on upcoming tasks (expectancy), and the values
Personal Determinants
Environmental Determinants
Behavioral Determinants
Texas Tech University, Tianlan Wei, August 2014
23
(subjective value) they place on these tasks. The relationship between expectancy and
task value is multiplicative, often written in formula as expectancy × value =
motivation (Wigfield & Eccles, 2002). In other words, if a learner scores 0 on either
of the two factors, he will not at all be motivated on the task, no matter how high he
scores on the other factor. According to Wigfield and Eccles (2000), expectancies and
values are assumed to be directly related to achievement motivation on learning tasks,
and a variety of variables are assumed to influence expectancies and values.
Figure 2.3. A social cognitive expectancy-value model of achievement motivation.
From Motivation in Education (p. 51), by D. H. Schunk, P. R. Pintrich, and J. L.
Meece, 2008, Columbus, OH: Merrill. Copyright 2008 by Merrill.
As shown in Figure 2.3, affective memories as a function of social world
factors (e.g., cultural milieu) and cognitive processes (e.g., perceptions of social
environment) are postulated to have immediate influence on task value. Although it
seems intuitive that an individual’s affective experiences with respect to a particular
Texas Tech University, Tianlan Wei, August 2014
24
activity or task may impact the values he assigns to the task, affective memories is a
less empirically explored component within the expectancy-value framework (Schunk
et al., 2008). The linkage between affective memories and task value, therefore,
remains largely unknown in the literature.
Interest is among the four components of subject value in the expectancy-value
model: attainment value, intrinsic value (interest), extrinsic/utility value, and cost.
Intrinsic interest is defined as the enjoyment that people experience when performing
a task or their subjective interest in the content of a task (Wigfield & Eccles, 1992).
The investigations of intrinsic interest for predicting motivation and performance have
generally yielded positive results, but interest is mostly analyzed as one subscale along
with attainment and extrinsic values for determining the task value of participants
(Schunk et al., 2008). Locke and Latham (1990) suggested that this conceptualization
of task values represented a rational decision-making process of individuals, which
alludes to the fact that investigations of interest remain largely cognitive.
Attribution Theory of Motivation Weiner’s (1986, 1992) general attributional model of motivation include
psychological consequences as a critical component of the attributional process. As
shown in Figure 2.4, the perceived antecedent conditions include both environmental
factors (e.g., social norms, situational features) and personal factors (e.g., casual
schema, attributional bias), based upon which learners generate their attributions for
factors such as ability, effort, and luck. This process is termed attribution process
(Kelley & Michela, 1980). For example, the gender stereotype that girls lack
mathematical ability (social norms) may lead a girl to attribute their poor performance
in mathematics to their perceived lack of ability, rather than effort. Then, in the
attributional process (Kelley & Michela, 1980), learners’ casual attributions are
expected to produce a series of psychological consequences, including expectancy for
success, self-efficacy, and affect.
Texas Tech University, Tianlan Wei, August 2014
25
Antecedent Conditions---------->
Perceive Causes--->
Causal Dimensions->
Psychological Consequences-->
Behavioral Consequences
Environmental factors
Attributions for
Specific information Ability Stability Expectancy for Choice
Social norms Effort Locus success Persistence
Situational features Luck Control Self-efficacy Level of effort
Personal factors Task Affect Achievement
Casual schemas difficulty
Attributional bias Teacher
Prior knowledge Mood
Individual differences
Health
Fatigue, etc.
Attribution Process Attributional Process
Figure 2.4. Overview of the general attributional model. From Motivation in
Education (p. 82), by D. H. Schunk, P. R. Pintrich, and J. L. Meece, 2008, Columbus,
OH: Merrill. Copyright 2008 by Merrill.
Like the expectancy-value model, the attributional model focuses on
expectancy and ability belief factors for predicting motivation and achievement,
although the former distinguishes ability belief (self-efficacy) from expectancy. In
regard to affect, the attributional model indicates an immediate linkage between
affective outcomes and behavioral indicators of motivation. Typical affective
outcomes include emotions such as pride, shame, guilt, anger, and sympathy, and
Weiner (1994) postulated that these emotions would likely contribute to behavioral
consequences of motivation such as choice of behavior and level of effort. However,
a key criticism to Weiner’s (1994) linkage of emotions to motivation is that attribution
theory is cognitive in nature. Rather than dealing with actual emotions, the attribution
theory focuses on the memory of emotions and its accompanying cognitions (Schunk
et al., 2008). As such, emotions within this framework are conceptually close to
Texas Tech University, Tianlan Wei, August 2014
26
affective memories in the expectancy value model. In addition, the attributional
model does not seem to encompass a dimension of interest or intrinsic enjoyment of
doing a task.
Goal-Orientation Theories Goal-orientation theories of motivation are focused on why individuals set up
specific goals for themselves and how they approach the tasks (Schunk et al., 2008).
While theorists have proposed different classifications of goal-orientations, the most
commonly cited and researched one has been the classification of mastery goal-
orientation versus performance goal-orientation (Dweck & Legett, 1988). Whereas
mastery/learning goal orientations are concerned about the mastery of knowledge and
skills according to self-set standards, performance goal orientations are more
concerned about how one’s performance is judged in relation to others. Theorists
further added an approach versus avoidance classification. Learners are thereby
classified into four types in the taxonomy: performance approach, performance
avoidance, mastery approach, and mastery avoidance. Dweck and his colleagues (e.g.,
Dweck, 1999; Dweck & Elliott, 1983; Dweck & Legett, 1988) also suggested that
goal-orientations may be a function of implicit theory of intelligence/ability: people
with entity beliefs tend to think their intelligence or ability in a specific domain is
“fixed” and cannot be improved, and those with incremental beliefs tend to think they
are “malleable” and may be enhanced. An attributional process (Weiner, 1986) serves
as the underlying mechanism in this context. An individual may attribute his past
successes of failures according to dimensions of controllability (controllable vs.
uncontrollable), locus (internal vs. external), and stability (stable vs. unstable). When
a learner attributes his past experiences to uncontrollable and stable reasons (e.g., “I’m
not born as a math person, and it won’t change no matter what”), he clearly holds
entity beliefs and will likely adopt a performance goal because deep, meaningful
learning will not improve his intelligence or ability. In contrast, a learner holding
incremental beliefs tend to see deep learning as a way to improve himself and will
Texas Tech University, Tianlan Wei, August 2014
27
more likely adopt a mastery approach goal because he believes that his ability is under
his own control.
Given the emphasis on cognitive processes in Weiner’s (1986) attribution
theory, it seems difficult to identify a role of interest in goal-orientation theories.
However, there have been a series of studies (e.g., DeCuir-Gunby, Aultman, & Schutz,
2009; Kaplan & Midgley, 1999; Pekrun, Elliot, & Maier, 2006, 2009; Pekrun, Goetz,
Titz, & Perry, 2002; Seifert, 1995) on the affective antecedents as well as outcomes of
different goal-orientations. In general, these studies concluded that positive emotions
are directly associated with approach goal-orientations, whether being a mastery or
performance-approach, but negative emotions are directly associated with avoidance
goals.
Intrinsic and Extrinsic Motivation The theoretical perspective of intrinsic motivation versus extrinsic motivation
also has a relation to the construct of interest. Intrinsic motivation “refers to
motivation to engage in an activity for its own sake,” and extrinsic motivation refers to
“motivation to engage in an activity as a means to an end” (Schunk et al., 2008, p.
236). Although intrinsic motivation and interest appear to be similar because they
both involve the enjoyment of engaging in an activity or task, theorists cautioned that
“interest is not a type of motivation but rather an influence on motivation” (Schunk et
al., 2008, p. 237). In fact, intrinsic motivation refers to a fairly complex process that
involves many cognitive and affective factors. Nevertheless, the notion that intrinsic
motivation changes over time (Schunk et al., 2008) sheds light on research on interest
in school-age children. Particularly, the overjustification hypothesis (Lepper, Greene,
& Nisbett, 1973) which attempts to account for diminishing intrinsic motivation in
children, analyzed interest as critical: offering learners rewards for academic tasks
conveys the message that their engagements do not reflect their own interests, and
such overjustification of their participation will detriment their intrinsic motivation as
a result. This phenomenon may also be explained by Skinner’s operant conditioning,
in which one’s intrinsic pleasure in doing specific tasks (i.e., antecedent stimuli) is
Texas Tech University, Tianlan Wei, August 2014
28
replaced by consequence stimuli: external rewards. Thus, the problem becomes that
rewards are not always stable and controllable, and a learner would no longer be
motivated if the reward is removed.
The Roles of Interest and Affect in Achievement Motivation In review of the roles of interest and affect in major theories of achievement
motivation, several issues emerged to guide this dissertation. Although interest and
affective factors are evident in some of the theories, they are conceptualized
differently in research models. In the expectancy-value theory, interest is among the
four components of task values, which operates as the immediate predictor of
motivation as well as a function of affective memories (see Figure 2.3). The
attribution theory, on the other hand, seems to suggest a more direct connection
between affective outcomes which are primarily emotions and motivation (see Figure
2.4), but there is little mention of the role of interest. Likewise, research guided by
goal-orientation theories has shown that approach goals and avoidance goals may
produce different affective outcomes, but there have been few investigations of how
interest may influence the formation of goal-orientations. Lastly, the classification of
intrinsic and extrinsic motivations emphasizes the role of interest in activating and
sustaining intrinsic motivation. Despite the fact that some theories are missing interest
or affect in the hypothesized models, the positive influence of interest on motivation
and achievement is generally recognized. The role of interest, however, seems of less
importance than the cognitive factors in most of these models. Since Eccles and
Wigfield (1995) validated the factor structure of the 19-item measure of adolescents’
task values and expectancy-related beliefs, their following studies of expectancy-value
theory have mostly relied on this measure, of which only two items are intended to
measure intrinsic interest.
With respect to the investigations of mathematics motivation and achievement
in particular, research has identified motivational factors including academic self-
concept, task value, and outcome expectations as predictors of achievement (e.g.,
Else-Quest, Mineo, & Higgins, 2013). Researchers, however, also proposed that this
Texas Tech University, Tianlan Wei, August 2014
29
line of research should do more to incorporate affective variables, given the findings
of their importance in academic motivation and achievement (Else-Quest et al., 2010;
Else-Quest, Hyde, & Hejmadi, 2008; Pekrun et al., 2006, 2009).
Gender, Ethnic, and Age Differences in Mathematics Interest and Affect
Gender Differences The saying “boys will be boys,” whether representing a belief or reality,
appears to be evident in many spheres in our society. In mathematics education,
stereotypes that girls lack mathematical ability persist (Bhana, 2005). Such
stereotypes reflect the reality in the sense that some studies on mathematics
achievement did demonstrate the advantage of being male. For example, Tsui (2007)
found that the mean SAT-Math score among male high school seniors had been
consistently higher than those of their female counterparts, while Xie and Shauman
(2003) indicated that the fraction of males to females who scored in the top 5% in high
school mathematics had remained constant at 2 to 1 over the past 20 years.
Among early developmental theories, Freud’s (1961) psychoanalytic theory
has to do with gender role development starting the phallic stage (3-5 years).
Although this may have limited implications for today’s research and has received
much criticism from feminism theorists, it suggested an early recognition or awareness
of gender roles among children. In light of the social cognitive theory, Bussey and
Bandura (2004) suggested that parents, peers, teachers, the mass media, and various
social institutions altogether operates as the societal system, which contributes to
gender role development of children. The empirical evidence indicated that infants
and toddlers learn to differentiate their gender roles according to their associated
appearance and activities. For example, Bussey and Bandura (1992) found that
children of 3 or 4 years old would already get upset when they are given gender-typed
toys of the other gender. Boys of this age would either refuse to play with, for
example, dolls or other toys requiring feminine, nurturing features, or try to convert
them into toys (e.g., guns) of masculinity. Given that the STEM professions have
Texas Tech University, Tianlan Wei, August 2014
30
been traditionally dominated by males in both reality and the mass media (NSF &
NCSES, 2010), it is not surprising that even a female preschooler would shy away
from the STEM related activities in an effort to confirm her feminine identity.
The discussion above is in line with a motivational perspective in explaining
gender differences in the STEM fields. If a gender gap in motivation accounts for a
large proportion of the variance in gender differences in achievement or enrollment
efforts, then the notion of a biological difference/intrinsic aptitude between gender
groups may be invalidated. In fact, there is little evidence endorsing gender
differences in intrinsic aptitude in the educational research (Rivers & Barnett, 2011),
and the notion of a biological difference has been criticized for not having a solid basis
(Spelke, 2005). As Rivers and Barnett (2011) stated, “what we do and how we do it
affect how our children’s brains begin to organize themselves and to process
information” (p. 6). This obviously aligns well with the social cognitive perspective
in motivation: the gender gap in mathematics may be explained by children’s
interactions with their environmental or societal influences. Some researchers (e.g.,
Guiso, Monte, Sapienza, & Zingales, 2008; Riegle-Crumb, 2005), for example, have
attempted to account for such a gender gap by exploring the gender inequities and
stereotypes available in a given culture.
Although the bulk of research has indicated the advantage of being male in
STEM learning and professions, some recent trends are also worth noting. While a
gender gap in achievement still exists as reported by most studies in this field, this gap
has also been found to be slowly closing (Hyde, Lindberg, Linn, Ellis, & Williams,
2010). To further complicate this trend, some cross-cultural or international
comparison studies also indicate a non-uniform pattern of gender differences in the
STEM fields (Else-Quest et al., 2010), with girls demonstrating higher achievement or
motivation (e.g., attitude, self-concept) than boys in some countries or sub-cultures.
These trends may be explained by the constant changes in societal influences such as
family patterns, parental expectations, and level of gender equity.
Texas Tech University, Tianlan Wei, August 2014
31
In the past two decades, researchers have become more interested in
motivational factors in mathematics motivation. In a Norwegian sample of middle to
high school students, Skaalvik and Skaalvik (2004) found that male students showed
higher self-concept, performance expectations, intrinsic motivation, and self-
enhancing ego orientations in mathematics than did female students. In terms of the
affective domain, that females have a higher anxiety about mathematics has been
empirically supported by many studies (e.g., Casey, Nuttall, & Pezaris, 1997; Hyde,
Fennema, Ryan, Frost, & Hopp, 1990; McGraw, Lubienski, & Strutchens, 2006).
Furthermore, Simpkins and Davis-Kean (2005) also suggested that girls are less likely
to aspire to careers that are related to mathematics, thus being less likely to take high
school mathematics courses. Given that motivational beliefs and achievement
behaviors are likely to be shaped through gender norms and roles (Jacobs & Simpkins,
2005), it is worth reviewing in the literature with respect to the developmental stage
gender norms and stereotypes start to influence children’s attitudes toward
mathematics.
For this issue, we may need to consult longitudinal studies in this area. Thus
far, much of the research on gender gap has been devoted to identifying the attitudinal
and motivational factors at specific time points of schooling, but only a few studies
have employed longitudinal models to investigate gender differences in the growth
over time. Among these studies, Leahey and Guo (2001) estimated curvilinear growth
models to examine gender differences in mathematics trajectories from elementary
through high school, and concluded that boys had a faster rate of acceleration despite
the relatively equal starting points and slopes. The findings seem to suggest that boys
and girls may demonstrate a gender gap in motivation at a very young age. In fact,
Eccles and her colleagues (Eccles et al., 1983, 1989) have conducted a number of
studies of individuals’ ability beliefs, expectancies for success, and subjective values,
and it was found that, as early as first grade, individuals begin to form clearly distinct
ability-expectancy beliefs and subjective values within the domains of mathematics,
reading, music, and sports.
Texas Tech University, Tianlan Wei, August 2014
32
Findings of group comparisons in this area may further complicate the
investigations. In Else-Quest et al.’s (2010) meta-analysis study for identifying cross-
national patterns of gender differences in mathematics, while boys on average reported
to have more positive math attitudes and affect, this statistic also showed large
variability among nations, indicating that girls in some nations (e.g., Thailand)
demonstrated higher motivation for mathematics than boys. Similarly, a comparison
of samples from Australia, Canada, and the United States by Watt et al. (2012)
showed that, while male adolescents held higher intrinsic value for mathematics than
females in the Australian sample, their male counterparts in both Canada and the
United States held higher ability/success expectancy, but not higher intrinsic value
than females. These findings indicate that gender differences in mathematics
motivation may also be a function of culture. Furthermore, gender differences may
also vary according to individual characteristics. For example, Ai (2002) used a large
national data set focusing on students’ development from Grade 7 through Grade 10,
and found that gender differences in mathematics trajectories vary by individuals’
initial status: a gender gap was found only in those who started low in mathematics,
with girls showing a higher initial status but a lower average growth rate than boys. In
conclusion, the literature indicated an early formation of children’s gender stereotypic
beliefs and interest in mathematics, as well as large individual variability due to
demographic characteristics (e.g., culture, starting point).
Ethnic Differences The underachievement of ethnic minority students has been a critical issue in
our education. Many studies have reported that Hispanic and African American
students scored significantly lower than their White and Asian counterparts in
standardized tests (e.g., Bainbridge & Lasley, 2002; Kao & Thompson, 2003; Riegle-
Crumb & Grodsky, 2010), and that such achievement gap emerges in their early
school years (Lee & Burkam, 2003). Similar to how gender stereotypes affect female
students’ interest and affect for mathematics, theorists suggested that minority
students may also be subject to such threat whereas negative academic stereotypes
Texas Tech University, Tianlan Wei, August 2014
33
regarding certain ethnic groups influence their motivation and performance (Ogbu,
2003; Steele & Aronson, 1995). In line with this stereotype threat hypothesis,
research has shown that such ethnic differences in mathematics achievement may be
largely attributed to motivational factors. For example, in examining the predictors of
mathematics achievement among White, African American, and Hispanic students,
Byrnes (2003) found that ethnicity accounted for less than 5% of the variance in
achievement when socioeconomic status, exposure to learning resources, and
motivation were controlled.
The motivational perspective is not new in the literature. Many early studies
took this approach to investigating ethnic differences in mathematics-related self-
concept, ability beliefs, attitude, interest, and affect along with gender differences in
these variables. Nonetheless, these early studies were considered to be limited by data
available at the time, as well as the cofounding effect of gender (Oakes, 1990). The
expectancy-value model (Eccles et al., 1983), which many of these investigations were
based on, is also viewed as inadequate for tackling issues of ethnic inequalities
(Riegle-Crumb, Moore, & Ramos-Wada, 2011). As such, researchers concerned about
ethnic inequalities have advocated research that investigates “the path to math” for
different subgroups of men and women, namely the intersection of ethnicity and
gender. For instance, using data from a large, national survey to examine ethnic-
specific gender differences in mathematics motivation, Catsambis (1994) found that,
while all female students tended to demonstrate less interest in mathematics, gender
differences were the largest among Hispanics and smallest among African Americans.
In the Byrnes (2003) study which indicated that ethnic differences in mathematic
achievement were largely accounted for by socioeconomic status, learning resources,
and motivation, however, no ethnic difference was revealed in ability-liking (e.g., “I
like math.”), the interest-related motivational factor in this study.
Further studies continued to yield mixed findings regarding the gender-
ethnicity interaction for explaining the variance in students’ mathematics interest and
affect. Riegle-Crumb et al. (2011), for example, indicated that African American and
Texas Tech University, Tianlan Wei, August 2014
34
Hispanic females were doubly disadvantaged because their attitudes were below those
of White males and males of their own ethnic groups. In terms of enjoyment, females
from all ethnicities showed less mathematics enjoyment than White males, with the
exception that only White females reported significantly less enjoyment than their
White male counterparts. Indexing cognitive and emotional engagement in
mathematics with students’ perceived levels of challenge, Martinez and Guzman
(2013) found that while boys across ethnicities reported similar levels of engagement
in mathematics classes, no singular story applied to girls of different backgrounds.
Specifically, Black girls demonstrated lower levels of engagement in mathematics as
compared to girls of other ethnicities, and Asian girls reported the highest, though not
significant, levels of engagement in this domain. Like Byrnes (2003), however, the
Else-Quest et al. (2013) study on mathematic attitudes again disaffirmed the gender-
ethnicity interaction. Their findings indicated a pattern of ethnic similarities in
mathematics attitudes only with the exception of greater mathematics values among
African American students.
The mixed findings of ethnic differences in mathematics clearly relate to the
varied concepts and operationalizations of affective variables in this domain of
research. Given the limited research explicitly dealing with the concept of interest and
the broadness of the concept of affect, this part of the literature review involved a
variety of affective variables such as liking (Byrnes, 2003), attitude (Catsambis, 1994;
Else-Quest et al., 2013; Riegle-Crumb et al., 2011), and emotional engagement
(Martinez & Guzman, 2013), and these variables were all operationalized in different
ways. For legitimate investigations of ethnic differences along with its intersection
with gender, more measures with decent validity need to be made available to the
scholarship. As a prerequisite, the psychometric properties of these measures should
be evaluated carefully with advanced techniques.
Age Differences Although some influential researchers stated that interest has a critical role in
the learning and development of students (Krapp et al., 1992; Renninger, 1992),
Texas Tech University, Tianlan Wei, August 2014
35
research is fairly limited on developmental differences in interest (Schunk et al., 2008).
In general, interest is believed to play a more important role in guiding the behaviors
of younger than older children and adults because older children and adults often have
to engage in mandatory tasks that are not of much interest to them (Hidi & Anderson,
1992; Krapp et al., 1992). Unfortunately, research has shown that students’ interest in
academic tasks declines with age, and that interest in mathematics and science tends to
drop the most (Eccles et al., 1998; Harter, 1981; Kahle, Parker, Rennie, & Riley, 1993;
Riegle-Crumb et al., 2011; Tracey, 2002; Tracey & Ward, 1998; Wigfield, 1994;
Wigfield & Eccles, 1992). Overall, these studies indicated that such declines occur as
early as third grade and extends all the way through 12th grade.
Several perspectives were provided for explaining the decline of interest
during school years. Many believe that these changes may be partly due to changes in
the students’ ability perceptions (Barak, 1981; Dweck & Elliot, 1983; Super, 1990;
Wigfield & Eccles, 1992). As individuals grow older, many of them develop entity
beliefs and tend to think that their intelligence or ability in a specific domain is fixed
and cannot be improved anymore. As a result, students who perform relatively poorly
in mathematics tend to devalue this subject area as a defense mechanism to protect
their overall self-esteem. Another factor concerns the school environment, which
tends to become more bureaucratic and controlling as students advance from
elementary to junior high (Eccles & Midgley, 1989). Related to the school
environment factor, the constraints of curricula that do not include choices may also
be responsible for the decline in students’ interest over time (Hoffmann, 2002).
Among the limited research investigating the developmental differences in
interest, Tracey and Ward’s (1998) study on the structure of children’s interest and
competence perceptions is of particular relevance to this dissertation. In evaluating
the Inventory of Children’s Activities (ICA) that was developed to assess interest
according to Holland’s (1985) RIASEC model—Realistic, Investigative, Artistic,
Social, Enterprising, and Conventional—the researchers identified the lack of
similarities of interest and competence structure across the age groups. In particular,
Texas Tech University, Tianlan Wei, August 2014
36
the RIASEC circular structure was evident in the college sample, but not the middle
school and elementary school samples as their younger counterparts. As the authors
argued, the failure of the circular model to fit the younger group indicates that the
structure of interest varies with age. Children appeared to respond to the ICA items
using different dimensions than did college students. Whereas college students used
the People/Things and Data/Ideas dimensions in responding, elementary students used
sex typing and locus of activity instead. Middle school students, on the other hand,
seemed to use the dimensions that were related to both college and elementary school
students. Clearly, this measurement study adds to the perspectives discussed earlier
that, in addition to changes in ability beliefs, school environment, and autonomy over
curricula, developmental differences in interest may also be a function of how
individuals’ cognitive interpretation of interest develop over time.
Summary of the Literature Despite the critical role of interest in enhancing learning and performance
(Schiefele et al., 1992), educational research on the topic of interest has not made
significant progress since its renaissance in the early 1990s. The varied
conceptualizations and discrete research approaches (Krapp et al., 1992) may still
prevent interest research from flourishing. That being said, the current trend in
educational research, as guided by the social cognitive perspective, represents a
promising direction for ongoing and future endeavors in this field. Particularly, there
is a need for more research to address the interplay among the three approaches (i.e.,
personal interest, situational interest, and interestingness) to interest research.
As discussed earlier, there is some conceptual fuzziness surrounding the
constructs of interest and affect, which makes it impossible to draw a distinction
between interest and affect. From a review of extant literature, interest may be
equated with an affective variable (e.g., attitude, emotion), indicated by certain
affective outcomes (e.g., liking, positive emotions), or viewed as a function of certain
affect (e.g., affective memories). Some cognitive variables also appear to relate to
interest in motivational theories, among which self-perceived competence or self-
Texas Tech University, Tianlan Wei, August 2014
37
efficacy are closely tied to the development of interest (Bandura, 1977; Super, 1990).
There is little doubt that future research shall benefit from a more unified and singular
conceptualization of interest, but considerable work is still required to address the
conceptual fuzziness plaguing interest research. At present, it is simply unwise for
researchers to exclude these closely related variables in their investigations.
Gender, ethnic, and developmental differences in interest are well-documented
in the literature, and these differences are worth careful investigation, given the
underachievement and under-representation of some subpopulations in certain
academic domains. Research on mathematics interest is particularly warranted to
provide implications for enhancing females and minority students’ achievement
motivation and career aspirations in the STEM fields. This line of research, however,
has merely shown us the tip of the iceberg. With respect to gender differences in
mathematics interest, findings appear to be more consistent in that males, particularly
White males, tend to show higher levels of interest or positive affect (Casey et al.,
1997; Catsambis, 1994; Hyde et al., 1990; McGraw et al., 2006; Riegle-Crumb et al.,
2011; Simpkins & Davis-Kean, 2005; Skaalvik & Skaalvik, 2004). When ethnicity is
added into such investigations, findings are mixed that there can be a gender-ethnicity
interaction in predicting mathematics interest (Catsambis, 1994; Martinez & Guzman,
2013; Riegle-Crumb, 2011) or no interaction at all (Byrnes, 2003; Else-Quest et al.,
2013; Tracey & Ward, 1998), and that even in the studies where this interaction is
present, no clear pattern was revealed as to, for example, what subpopulation (e.g.,
Black female) is the most advantaged or disadvantaged. Certainly, the developmental
differences in interest further complicate this topic, directing our attention at the
measurement issues of existing instruments of mathematics interest (e.g., Tracey &
Ward, 1998).
The measure of interest suffers from the nature of interest being both a
psychological and a common language term. This metatheoretical situation of interest
(Valsiner, 1992) calls for the application of advanced techniques in examining the
item-person interaction because, in comparison to the measure of some cognitive
Texas Tech University, Tianlan Wei, August 2014
38
variables (e.g., self-efficacy), we are even less certain that an interest item is
interpreted and responded to in the way it is intended. Adding to the significance of
this psychometric evaluation are the gender, ethnic, and developmental differences in
mathematics interest. A key task of this evaluation was to detect whether the
differences among various subpopulations are a function of measurement bias rather
than the true differences in the latent trait.
Research Questions and Hypotheses The following research questions have the corresponding hypotheses from
reviewing extant literature:
1. From both CTT and IRT frameworks, does the SDQI accurately measure
interest and affect in mathematics?
Hypothesis: The SDQI will have sufficient psychometric properties for the
sample as a whole according to CTT and IRT framework in measuring mathematics
interest and affect.
2. Do items of mathematics interest and affect demonstrate measurement bias
across gender?
Hypothesis: Gender differences will emerge at the item-level as revealed by
differential item functioning analyses in an IRT framework.
3. Do items of mathematics interest and affect demonstrate measurement bias
across ethnic groups?
Hypothesis: Ethnic group differences will emerge at the item-level as revealed
by differential item functioning analyses in an IRT framework.
4. Do items of mathematics interest and affect demonstrate measurement bias
across age groups?
Hypothesis: Item differences by age group will emerge at the item-level as
revealed by differential item functioning analyses in an IRT framework.
Texas Tech University, Tianlan Wei, August 2014
39
5. How do children’s responses to the mathematics interest and affect items
change over time?
Hypothesis: Item-level responses will change over time as a whole and as a
function of group membership according to gender and ethnic group.
Texas Tech University, Tianlan Wei, August 2014
40
CHAPTER III
METHOD
Description of the Data The data for this dissertation were drawn from the Early Childhood
Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K; Tourangeau, Nord, Lê,
Sorongon, & Najarian, 2009), a national longitudinal data set. The ECLS-K focuses
on children’s early school experiences beginning in kindergarten as a multisource,
multimethod study including interviews with parents, data collected from school
principals and teachers, student record abstracts, and direct child assessment.
Beginning in the fall of 1998, the ECLS-K has been collecting data on a nationally
representative cohort of children from kindergarten into middle school. A total of
21,260 children throughout the nation participated. By the time of the present study,
the ECLS-K had completed seven waves of collections: kindergarten-fall (Wave 1),
kindergarten-spring (Wave 2), first grade-fall (Wave 3), first grade-spring (Wave 4),
third grade-spring (Wave 5), fifth grade-spring (Wave 6), and eighth grade-spring
(Wave 7).
The design of the ECLS-K was guided by a framework of children’s
development and schooling that emphasizes the interrelationships among the child, the
family, the school, and the community. Along with data collection from
parents/guardians, teachers, and school administrators, direct child assessment is a
critical study component of the ECLS-K, in which children were asked to participate
in activities designed to measure important cognitive (i.e., literacy, quantitative, and
science) and non-cognitive (i.e., fine motor and gross motor coordination and
socioemotional) skills and knowledge. Beginning with the third-grade collection
(Wave 5), children were also asked to report on their self-perceptions of abilities and
achievements, peer relationships, and problem behaviors. Specifically, children
attending third grade in the spring of 2001-02 were asked to complete a short self-
description questionnaire, part of which was the adapted Self Description
Questionnaire I (SDQI), on how they thought and felt about themselves both
Texas Tech University, Tianlan Wei, August 2014
41
academically and socially. The same questionnaire was administered in Wave 6, when
these children were attending fifth grade in the spring of 2003-04 (Tourangeau et al.,
2009).
Sampling Frame The ECLS-K employed a multistage probability sample design to form a
nationally representative sample of children attending kindergarten in 1998-99, with
the primary sampling units (PSU) being geographic areas; the second-stage units
being schools within primary sampling units, and the third- and final-stage units being
children within schools. In the base year (Wave 1), the basic PSU measure of size was
the number of 5-year-olds with modification to facilitate the oversampling of Asian
Pacific Islanders. One hundred PSUs were selected for the ECLS-K, among which 24
PSUs with the largest measure of size were designated self-representing (SR) and
included in the sample with certainty while the remaining PSUs were partitioned into
38 strata of roughly equal size. Next, private and public schools offering kindergarten
programs were selected as the second-stage units, and the third-stage sampling units
were children of kindergarten age, selected within each sampled school (Tourangeau
et al., 2009).
The sample for this dissertation is limited to children assessed in third- and
fifth-grades (Waves 5 and 6) by the SDQI, drawn from the “Early Childhood
Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) kindergarten through
fifth grade Approaches to Learning and Self Description Questionnaire (SDQ) items
and public-use data files,” particularly the SDQ item level data (U.S. Department of
Education, National Center for Education Statistics, 2010). Hence, it is necessary to
discuss in detail the sampling procedures of these two waves. The sample of children
for spring-third grade (Wave 5) consists of all children who were base-year
respondents, and children who were brought into the sample in spring-first grade
(Wave 3) through the sample freshening procedure. While an effort was made to
contact all children enrolled in the base-year schools, slightly over 50% of the base-
year sampled children who had transferred from their kindergarten school were
Texas Tech University, Tianlan Wei, August 2014
42
followed for data collection. The subsample of children was the same 50% subsample
of base-year movers flagged for following in spring-first grade, with the addition of
movers whose home language was not English. Following Wave 5, the fifth grade
data collection (Wave 6) excluded 5,214 of the 21,357 children eligible for the study
after the base year for reasons such as mortality or parental refusal to cooperate. The
remaining children were subsampled at different rates depending on the longitudinal
data available (Tourangeau et al., 2009).
Sample While a nationally representative sample of 21,409 children is available for
analysis from the ECLS-K “Kindergarten through fifth grade Approaches to Learning
and Self Description Questionnaire (SDQ) item wording and item-level data,” 6,678
of them were removed from this study because they were missing data throughout all
the SDQI measures in Waves 5 and 6. Among the remaining 14,631 participants, 50.9%
(n = 7,441) reported being male, and 49.1% (n = 7,190) reported being female. In
terms of race, approximately 56.2% (n = 8,226) reported being “White (non-
Hispanic),” followed by 13.1% (n = 1,918) being “Black or African American,” 9.4%
(n = 1,371) being “Hispanic (race not specified),” 8.9% (n = 1,298) being “Hispanic
(race specified),” 6.7% (n = 978) being “Asian,” 2.6% (n = 387) endorsing more than
one race (non-Hispanic), 1.8% (n = 260) being “American Indian or Alaska Native,”
1.2% (n = 174) being “Native Hawaiian or Other Pacific Islander,” and 0.1% (n = 19)
with race data missing. For all ethnicity-related analyses in the current study,
“Hispanic (race not specified)” and “Hispanic (race specified)” were combined as
“Hispanic,” and multiple races, “American Indian or Alaska Native,” and “Native
Hawaiians or Other Pacific Islander” were combined as “Other Ethnicities.”
Instrumentation All measures were obtained from the “ECLS-K kindergarten through fifth
grade Approaches to Learning and Self Description Questionnaire (SDQ) items and
public-use data files” (U.S. Department of Education, National Center for Education
Statistics, 2010). In both third- and fifth-grades, the ECLS-K children were asked to
Texas Tech University, Tianlan Wei, August 2014
43
complete the adapted Self Description Questionnaire I (SDQI), which was designed to
measure children’s self-perceptions of their school abilities and interests, peer
relationships, and problem behaviors. The adapted SDQI is a 42-item measure, in
which children self-report their perceived competence and interest in four non-
academic domains (i.e., physical abilities, physical appearance, relations with peers,
and relations with parents) and three academic domains (i.e., reading, mathematics,
and all school subjects). Table 3.1 presents the items of the mathematics subscale. As
shown in the table, there are a total of eight items with ordered response options
ranging from 1 to 4, including not at all true, a little bit true, mostly true, and very true.
Table 3.1
The ECLS-K Adapted SDQI Mathematics Subscale
Item Statement SDQ6 Work in math is easy for me. SDQ12 I cannot wait to do math each day. SDQ16 I get good grades in math. SDQ22 I am interested in math. SDQ26 I can do very difficult problems in math. SDQ30 I like math. SDQ36 I enjoy doing work in math. SDQ41 I am good at math.
Face Validity The Self Description Questionnaire I (SDQI) was initially designed to measure
seven components of self-concept (i.e., physical abilities, physical appearance,
relations with peers, relations with parents, reading, mathematics, and all school
subjects) based on Shavelson’s hierarchical model (Shavelson & Bolus, 1981;
Shavelson, Hubner, & Stanton, 1976). The three academic subscales (i.e., reading,
mathematics, and all school subjects) are each measured by 10 parallel items including
five cognitive statements (e.g., “I get good marks in mathematics”) and five affective
statements (e.g., “I am interested in mathematics”). Children are asked to respond to
each item on a 5-point scale including false, mostly false, sometimes false sometimes
Texas Tech University, Tianlan Wei, August 2014
44
true, mostly true, and true. The factor structure of the original SDQI was validated by
its test developers (Marsh, Relich, & Smith, 1983; Marsh, Smith, & Barnes, 1983).
Using two independent samples of fifth and sixth graders, Marsh, Relich, et al. (1983)
conducted factor analysis and identified the seven dimensions that the SDQI was
intended to measure, and an additional factor that was defined by affective items from
all three academic scales. In other words, the five affective items of each academic
domain tended to load together on an eighth factor in addition to loading on their own
domains respectively.
Comparatively, limited psychometric examination has been conducted on the
ECLS-K adapted version of the SDQI. The mathematics subscale (see Table 3.1), in
particular, was adapted from the original version to include only positively worded
statements with slight adaptations in wording (e.g., “I get good grades in math”
adapted from “I get good marks in mathematics”). The adapted subscale still consists
of an equal number of cognitive statements (i.e., SDQ6, 16, 26, and 41) and affective
statements (SDQ12, 22, 30, and 36). In addition, the response scale was changed from
5-point to 4-point with anchors adapted as well. A reasonable a priori consideration of
this study was, therefore, whether the eight items underlie a single dimension of
measurement, leading to the examination of the face validity of items, as well as the
use of factor analysis to explore and confirm the factor structure of the SDQI
mathematics subscale (discussed in detail in Chapter 4).
Face validity, sometimes termed as content validity, refers to the degree to
which a test appears to measure what it claims to measure (Gay et al., 2008). Because
the SDQI was founded on the model of academic self-concept, it is worth discussing
how self-concept relates to other cognitive and affective constructs, particularly
academic interest. Shavelson et al.’s (1976) definition which formed the theoretical
foundation of contemporary self-concept research suggested that self-concept, in very
broad terms, is a person’s perception of himself which may be described as “organized,
multifaceted, hierarchical, stable, developmental, evaluative, and differentiable” (p.
411). Different views exist as to whether self-concept includes emotional reactions
Texas Tech University, Tianlan Wei, August 2014
45
such as interest, enjoyment, and satisfaction. While the development of the SDQI
clearly regarded these as a part of self-concept, other researchers (e.g., Eccles &
Wigfield, 1995) have made clear distinctions between ability- or expectancy-related
perceptions and task-value components (Bong & Skaalvik, 2003). Although perceived
competence plays a central role in academic self-concept, some researchers also
argued that self-concept consists of several distinguishable aspects, one of which
being affective in nature (e.g., Bong & Clark, 1999; Scheirer & Kraut, 1979; Skaalvik,
1997). In fact, some studies on the SDQ did indicate that the cognitive dimension and
the affective dimension tend to form separate factors (Skaalvik & Rankin, 1996;
Tanzer, 1996).
Nonetheless, the SDQI mathematics items, particularly the affective statements,
all appear to either assess interest directly (SDQ22—“I am interested in math”) or
reflect some certain aspects of interest that are well-documented in the literature
(discussed in detail in Chapter 2) such as motivation (SDQ12—“I cannot wait to do
math each day”), liking (SDQ30—“I like math”), and positive emotion (SDQ36—“I
enjoy doing work in math”). Furthermore, even the cognitive aspect as perceived
competence was shown to have a reciprocal relationship with academic interest
(Marsh et al., 2005). Wigfield and Eccles (2002) particularly emphasized the role of
perceived competence in developing situational interest to personal interest in school
settings. In summary, the SDQI mathematics items appear to have sufficient face
validity for examining mathematics interest and affect, though this argument was to be
strengthened by results from the factor analysis.
Item Response Theory In the creation and evaluation of psychological assessments, researchers have
traditionally turned to psychometric techniques that provide guidelines to determine
the value of an instrument (Kline, 2000). The traditional models and procedures such
as the classical test theory (CTT) model are based on weak assumptions that can be
met easily by most data sets, but there are also some well-documented shortcomings
associated with these techniques. First, CTT is sample-dependent, meaning that the
Texas Tech University, Tianlan Wei, August 2014
46
reliability estimates depend on the sample from which they are derived. Researchers
would, therefore, expect different item characteristics when the measure is used in
different samples. Second, CTT does not take into account the difficulty level or
threshold of responses of a measure for some individuals. If an item appears too
difficult for specific individuals to answer correctly or too high a threshold for specific
individuals to endorse a higher-order response, then the measure will likely be unable
to yield reliable estimates of their true scores on the construct being measured. Third,
CTT assumes that the amount of measurement error will be the same across all items
in a scale, but this assumption is often violated, given the varying item-level
characteristics. Finally, the reliability estimates often require correlating individuals’
responses on an alternative, strictly parallel test, which is often a challenge to test
developers. In addition to these shortcomings, CTT has also failed to provide
satisfactory solutions to many testing problems such as the design of tests and the
identification of biased items (Hambleton & Swaminathan, 1985). Because of this,
considerable attention has been directed at item response theory (IRT) over the past
three decades.
An IRT model specifies a relationship between the observable response and the
unobservable traits assumed to underlie the respondent’s performance on the test, and
this relationship is defined by a mathematical function. Therefore, IRT is in fact a
system of mathematical models that defines the relationship between the latent traits
and their manifestations (de Ayala, 2009). There are three primary advantages of IRT
models over CTT models. First, assuming a large pool of items all measuring the
same trait, the estimate of a respondent’s latent trait is independent of the particular
sample of test items that are administered to the respondent. Second, assuming a large
population of respondents, the estimates of item characteristics are independent of the
particular sample of respondents. Third, a statistic indicating the precision with which
each respondent’s latent trait is estimated is provided, and this statistic is free to vary
from one respondent to another. An additional desirable feature is that the concept of
parallel forms reliability commonly seen in CTT is replaced by the concept of
Texas Tech University, Tianlan Wei, August 2014
47
statistical estimation and associated standard errors. The feature of item parameter
invariance, though not unique to IRT, can also be well achieved when the chosen
model fits the data (Hambleton & Swaminathan, 1985).
Assumptions Being mathematical in nature, IRT models include a set of assumptions
including unidimensionality, local dependence, and functional form (de Ayala, 2009;
Hambleton & Swaminathan, 1985). The unidimensionality assumption requires that
the observations on the test items are solely a function of a single continuous latent
trait. In the current study, it means that the SDQI items need to underlie the latent trait
of mathematics interest as a unidimensional rather than multidimensional construct.
This assumption, however, may be viewed as analogous to the homogeneity of
variance assumption that some degree of violation of it may or may not be problematic.
That is, a unidimensional IRT model may provide a sufficiently accurate estimate to
be useful even though the data may in fact underlie two latent traits (de Ayala, 2009).
It is of particular relevance to the current study that the IRT model may still be robust
for estimation because the unidimensionality of the SDQI mathematics subscale,
though well supported by previous validation studies, had yet to be confirmed by
factor analysis in the current study.
A second assumption, local dependence, means that the responses to an item
are independent of the responses to any other item conditional on the person’s location.
In other words, how a person responds to an item needs to be solely a function of his
relatively standing on the latent continuum, but not even how he himself responds to
any other items of the measure (de Ayala, 2009). Violations of this assumption occur
more commonly in cognitive tests where, for example, there are a series of questions
that all relate to the same passage. Considering the affective nature of the SDQI items,
violations of local dependency are unlikely to occur. A third assumption, functional
form assumption, requires the data to follow the function specified by the model. This
assumption is rarely exactly met in practice, and the degree of violation can be
assessed by IRT model fit indices (de Ayala, 2009).
Texas Tech University, Tianlan Wei, August 2014
48
Graded Response Model IRT is a system of mathematical models that defines the relationship between
the latent traits and observed indicators. Commonly used IRT models include the
Rasch model, the one-, two-, and three-parameter logistic models for dichotomous
items, and the graded response model and partial credit model for multi-category
scoring (Hambleton & Swaminathan, 1985). Given the ordered response format of the
SDQI items, Samejima’s (1969) two-parameter (2PL) graded response model (GRM)
was employed for the IRT estimates. In the GRM, option characteristic curves
(OCCs) are estimated for each response option in an item. The mathematical formula
with an item of K = 4 ordered responses options (k = 0, 1, 2, 3) may be written as:
𝑃𝑘,𝑖(𝜃) =
⎩⎪⎪⎪⎨
⎪⎪⎪⎧ 1 −
𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�, 𝑘 = 0
𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖� −
𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�, 𝑘 = 1
𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖� −
𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�, 𝑘 = 2
𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�, 𝑘 = 3
(1)
where Pk,i is the probability of a person’s endorsement of response option k for item i,
θ (theta) is the latent trait, ai is the discrimination parameter for item i, the
mathematical constant e is the base of the natural logarithm, and bk,i is the threshold
parameter (de Ayala, 2009). Figure 3.1 presents a graphical illustration of the OCCs
for a hypothetical item with four ordered options with a = 1.7, b1 = -1.5, b2 = 0, and b3
= 1.5.
Texas Tech University, Tianlan Wei, August 2014
49
Figure 3.1. Option characteristic curves (OCCs) for a hypothetical item with four
response options (a = 1.7, b1 = -1.5, b2 = 0, b3 = 1.5).
In the 2-PL GRM, the discrimination parameter (a) indicates the degree of
slope at each point of inflection. According to Baker (1985, 2001), item
discrimination is considered to be very low for a < .34, low for .35 ≤ a ≤ .64, moderate
for .65 ≤ a ≤ 1.34, high for 1.35 ≤ a ≤ 1.69, and very high for a > 1.70. The threshold
parameters b1,i and b3,i represent the values of θ where the probability is 0.5 for
endorsing the lowest (k = 0) and highest options (k = 3), respectively, and the modes
of the other OCCs (k = 1, 2) are specified as (b1,i + b2,i)/2 and (b2,i + b3,i)/2,
respectively. The intersection of any two OCCs otherwise indicates an equal
probability of endorsing option k or higher versus endorsing lower order options (de
Ayala, 2009). As shown in Figure 3.1, the probability is 0.5 for an individual to
endorse Option 0 (not at all true) when his interest for mathematics is 1.5 standard
deviations below the mean, and for Option 3 (very true) when 1.5 standard deviations
above the mean. For OCCs of Options 1 (a little bit true) and 2 (mostly true), the θ
values are (-1.5 + 0)/2 = -0.75 where the mode of OCC for Option 1 is present and (0
+ 1.5)/2 = 0.75 for Option 2.
Texas Tech University, Tianlan Wei, August 2014
50
In addition to OCC, item information function (IIF), a function of θ, also
provides valuable insight about the precision of measurement provided by each
specific item in a test. Samejima (1974) defined information for polytomous IRT
model as:
𝐼𝑖(𝜃) = � 𝐼𝑘,𝑖
𝑚𝑖
𝑘,𝑖 = 0
(θ) = ��𝑃𝑘,𝑖′�
2
𝑃𝑘,𝑖
𝑚𝑖
𝑘,𝑖=0
(2)
where Ii is the item information for item i, mi is equal to the number of score points
minus one, Ik,i is item option information of response option k which potentially
contributes to the item information, Pk,i is the same as in Equation 1, and Pk,i’, given K
= 4, is defined as:
𝑃𝑘,𝑖′(𝜃) =
⎩⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎧ −𝑎𝑖 �
𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖��2� , 𝑘 = 0
𝑎𝑖 �𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖��2 −
𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖��2� , 𝑘 = 1
𝑎𝑖 �𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖��2 −
𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖��2� , 𝑘 = 2
𝑎𝑖 �𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
�1 + 𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖��2� , 𝑘 = 3
(3)
where ai and bi represent the same parameters as in Equation 1. As such, IIF curves
may also be graphically illustrated as shown in Figure 3.2. Furthermore, the IIF can
be summarized to create test information function, and the amount of information is
influenced by the quality and number of test items. The contribution of each item to
the total information is addictive and depends on how highly each item correlates with
other items in the set (Hambleton & Swaminathan, 1985).
Texas Tech University, Tianlan Wei, August 2014
51
Figure 3.2. Item information function (IIF) curve for a hypothetical item with four
response options (a = 1.7, b1 = -1.5, b2 = 0, b3 = 1.5).
Differential Item Functioning Invariance is key component to accurate measurement because the examination
of any group difference is meaningless given poor quality of the measurement tool.
Such measurement bias may emerge out of many situations in research. For instance,
individuals from different cultures may interpret and respond to an item differently,
and individual differences (e.g., gender, age, personality) may trigger the use of
different frames of reference in the responding process as well (Vandenberg, 2002).
Differential item functioning (DIF), also referred to as measurement bias, occurs when
respondents of subgroups with the same latent trait have different probabilities of
endorsing a test item (Holland & Wainer, 1993). DIF may exist in a variety of tests
including attitude and personality tests, as well as cognitive tests when they contain
items that inadvertently assume that the respondents have certain knowledge or a
particular background in order to understand the items as intended. Taken into the
context of the current study, this could mean that, for instance, a boy and a girl with
the same mathematics interest may respond to the corresponding item differently as a
Texas Tech University, Tianlan Wei, August 2014
52
function of gender. Likewise, age neutrality may also be examined for measures of
interest because research suggests that, as children age and their sense of
metacognition develops, different interpretations may trigger DIF concerning how
they respond to the measures of interest.
From the IRT perspective, the existence of DIF means that the item’s
parameter estimates (e.g., item discrimination, item threshold) are not invariant across
the manifest groups, namely item-data misfit. As such, there are two forms of DIF in
the IRT framework: uniform DIF which indicates one group performs better than the
other group throughout the continuum, and nonuniform DIF which indicates that one
group performs better than the other group only for a particular portion of the
continuum. Graphically, uniform DIF is represented by parallel OCCs between two
groups, while the OCCs tend to intersect for nonuniform DIF (de Ayala, 2009).
General DIF methods based on IRT include the likelihood ratio (TSW-ΔG2;
Thissen, Steinberg, & Wainer, 1988, 1993), Lord’s Wald test (Lord, 1980), and the
Exact Signed Area and H Statistic (Raju, 1988, 1990), while non-IRT-based
approaches to DIF detection include, but are not limited to, the nonparametric Mantel-
Haenszel Chi-Square (MH) statistic (Holland & Thayer, 1988), log linear modeling
(Mellenbergh, 1982), and the use of logistic regression (Swaminathan & Rogers,
1990). Although these methods may be categorized into IRT-based versus non-IRT-
based or parametric versus nonparametric, it should be noted that the parametric and
nonparametric models are somewhat interrelated given that several parametric models
are related to nonparametric methods. For example, both the Rasch IRT model and
the MH methods examine the odds of item response conditional on a latent trait
estimate or a (weighted) total score (Teresi, 2006).
There are advantages and disadvantages in the use of either parametric or
nonparametric approaches to DIF detection. Parametric, IRT-based methods, though
being statistically more powerful in detecting DIF, may identify DIF as an artifact of
model misspecification, and that very large sampling covariance among parameter
estimates can also cause some concern in such methods (Potenza & Roran, 1995). On
Texas Tech University, Tianlan Wei, August 2014
53
the other hand, nonparametric methods are relatively free of model misspecification
and collinearity problems (Bolt, 2002), but they require sufficient data to directly
estimate the regressions of item score on test score. Moreover, IRT-based, parametric
methods can be particularly helpful in detecting nonuniform DIF. Built on the IRT
framework, this dissertation employed the Lord’s Wald test for the examination of
potential DIF in the SDQI mathematics items.
Lord’s Wald Test Lord’s Wald test (1977, 1980) is asymptotically equivalent to the TSW-ΔG2
and compares vectors of IRT item parameters between groups. For a specific item, if
the vectors of its parameters differ significantly between groups, then the OCCs differ
across groups, indicating significant DIF. For a two-group comparison, Lord first
proposed a test to evaluate the significance of DIF for the item threshold (bi) only:
𝑑𝑖 = 𝑏�𝐹𝑖 − 𝑏�𝑅𝑖
�𝑉𝑎𝑟�𝑏�𝐹𝑖� + 𝑉𝑎𝑟�𝑏�𝑅𝑖�
(4)
where 𝑏�𝑅𝑖 and 𝑏�𝐹𝑖 are the maximum likelihood estimates of the parameter bi in the
reference group and the focal group, respectively, and Var(𝑏�𝑅𝑖) and Var(𝑏�𝐹𝑖) are the
corresponding estimates of the sampling variances of 𝑏�𝑅𝑖 and 𝑏�𝐹𝑖. This test was then
extended for differences between the discrimination parameters (ai), becoming a more
general test of the joint difference between [ai, bi] for the two groups:
χ𝑖2 = ν𝑖′� ν𝑖−1
𝑖
(5)
where ν𝑖′ is [𝑎�𝐹𝑖 − 𝑎�𝑅𝑖, 𝑏�𝐹𝑖 − 𝑏�𝑅𝑖], ∑i is the estimate of the sampling variance-
covariance matrix of the differences between the item parameters, and χ2 is the
distribution with two degrees of freedom (Langer, 2008).
The original implementation of the Wald test was not particularly pertinent to
IRT models, and tended to show severe Type I error. The test was recently improved
Texas Tech University, Tianlan Wei, August 2014
54
(Cai, 2012; Cai, Thissen, & du Toit, 2011; Langer, 2008) with modern, accurate IRT-
based error estimation. The improved Wald test estimates the covariance matrix using
the supplemented expectation maximization (SEM) algorithm (Cai, 2008), a strategy
for calculating the information matrix when an EM algorithm is used for parameter
estimation.
Langer (2008) introduced a two-stage Wald procedure for detecting DIF. In
the first stage, the mean and standard deviation of the reference group are fixed while
those of the focal group are freely estimated, and all of the item parameters are
constrained equal between groups. In the second stage, the focal mean and standard
deviation are fixed to the values obtained in the first stage. The simulation study
conducted by Woods, Cai, and Wang (2012), however, indicated that the DIF
contamination in the two-stage procedure is likely to produce Type I error inflation
and other inaccuracies, because the focal mean and standard deviation are estimated
from a misspecified model if there is DIF that does not cancel out in the first stage.
Woods et al. (2012) thus recommended using the one-stage procedure (Cai et al.,
2011), in which the mean and standard deviation of the focal group are estimated
simultaneously with estimation of the item parameters, and item parameters are either
constrained equal between groups (anchor items) or free to vary between groups
(studied items).
In mentioning anchor and studied items in DIF analysis, user-specified anchor
items are required in most DIF procedures. Anchor items can be specified based on
prior research or prior testing. DIF analysis without much prior research may benefit
from purification or anchor selection methods. In fact, Woods et al. (2012)
recommended applying Langer’s (2008) two-stage procedure as the prior testing for
anchor selection before the one-stage procedure.
Analysis of Data Prior to analyses, the data were first screened for missingness, multicollinearity,
and univariate and multivariate outliers in SPSS v20. Bivariate correlation
Texas Tech University, Tianlan Wei, August 2014
55
coefficients were computed among the eight SDQI items of each wave for determining
multicollinearity (r > .80); univariate outliers were examined using the criterion of |z| >
3; and Cook’s distances were computed for identifying multivariate outliers—Cook’s
distance > 1 (Cook & Weisberg, 1982).
Analyses were conducted in three phases. In Phase 1, analyses were conducted
to address the first research question of this study regarding the psychometric
properties of the SDQI mathematics subscale from both the CTT and factor analytic
perspective. Factor analysis was first performed to examine the dimensionality of the
scale. Based on the established factor structure of the scale, internal consistent
reliabilities were then assessed to add the CTT-based psychometric information of the
scale. Next, the scale’s predictive validity was assessed through regressing
participants’ mathematics achievement scores on their SDQI scores. Finally, group
differences (i.e., gender, ethnicity, and age), as assessed by the current ECLS-K
adapted SDQI summed scores, were examined using factorial analysis of variance
(ANOVA). In phase 2, IRT analyses were performed separately on third- and fifth-
grade full samples to address the first research question regarding the IRT-based
psychometric properties of the scale. In Phase 3, DIF analyses were performed across
gender (to address the second research question) and ethnic groups (to address the
third research question) for each wave, and then finally across third- and fifth-grade
data for detecting IRT parameter drift (to address the fourth and fifth research
questions).
Phase 1: Factor Structure, Reliability, Validity, and Group Differences To determine whether the aforementioned separability (discussed in Chapter 3)
between the affective and cognitive dimensions exists in the SDQI mathematics
subscale, factor analysis was conducted separately on the third- (Wave 5) and fifth-
grade (Wave 6) data. Exploratory factor analysis (EFA) is typically used for
determining the appropriate number of common factors (Brown, 2006). Principal axis
factoring analysis (EFA) with promax rotation was conducted using SPSS v20 on the
eight items separately for Wave 5 and Wave 6 data. Promax rotation allowed for
Texas Tech University, Tianlan Wei, August 2014
56
factors to be correlated (Field, 2009), and the assumption was made that, once
multiple factors were extracted (e.g., perceived competence, interest), these factors
would have correlated with each other. To determine the number of factors, both the
eigenvalues (Kaiser, 1960) and the scree plot’s point of inflexion (Field, 2009) were
consulted. A confirmatory factor analysis (CFA) was then conducted in Mplus v6
(Muthén & Muthén, 2010) using weighted least square mean-and-variance adjusted
(WLSMV) χ2 test statistic estimation. CFA is used in later phases of instrument
development after the underlying structure has been established on prior empirical
grounds. The use of the WLSMV estimator was based on Schmitt’s (2011) suggestion
that data from ordered response scales are often not continuous and not normally
distributed. The acceptable model fit for CFA was defined by Hu and Bentler’s (1999)
combinational rules: (a) Comparative Fit Index (CFI) or Tucker-Lewis Index (TLI) >
0.95 and Standardized Root Mean Square Residual (SRMR) < .09, or (b) Root Mean
Square Error of Approximation (RMSEA) < .05 and SRMR < .06. Because Mplus
provides Weighted Root Mean Square Residual (WRMR) instead of SRMR when the
WLSMV estimator is activated, the cutoff value of WRMR < 1.0 (Yu, 2002) was also
consulted.
Reliability of a test refers to the degree to which a test consistently measures
whatever it is measuring (Gay et al., 2008). There are several types of reliability such
as test-retest reliability, equivalent forms reliability, inter-rate reliability, and internal
consistency reliability. The design of the current study only allowed for the
assessment of the internal consistencies of the scale, which were assessed using the
Cronbach’s alpha (α) method in SPSS v20. Criterion-related validity is determined by
relating performance on a test to performance on a second test or other measure (Gay
et al., 2008). A measure of academic achievement in mathematics, already IRT-scaled
in the ECLS-K data set, was utilized in which to determine the criterion-related
validity of the SDQI as a measure of mathematics interest. The variable names of
mathematics IRT scale scores were C5R4MSCL for third-grade and C6R4MSCL for
fifth-grade, and a bivariate regressional analysis was conducted each for third- and
Texas Tech University, Tianlan Wei, August 2014
57
fifth-grade for predicting mathematics achievement from children’s SDQI summed
scores on the mathematics subscale. Finally, to detect the group differences across
gender, ethnicity and age groups as reflected by the SDQI summed scores, a 2 × 4
(Gender × Ethnicity) factorial ANOVA was conducted each on third- and fifth-grade
data, and a paired-samples t test was conducted to examine how children’s SDQI
scores changed from third grade to fifth grade.
Phase 2: Item Response Theory (IRT) Analysis The IRT analyses were conducted separately using the third- and the fifth-
grade data in flexMIRT v2. The flexMIRT v2 bases its “graded model calibration”
function on Samejima’s (1969) 2-PL GRM and provides, for each estimation,
parameter estimates (i.e., item discrimination as ai, item threshold as bi or item
intercept as ci), item and test-information function values, and fit indices including
Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), the
Pearson χ2 statistic, the likelihood ratio statistic (labeled G2), the estimated population
discrepancy function value (labeled F0hat), and the RMSEA (Houts & Cai, 2013).
Phase 3: Differential Item Functioning (DIF) Analysis A series of DIF analyses were performed using the IRT-based Lord’s Wald test
in flexMIRT v2. DIF analysis is normally conducted by comparing two groups of
respondents: the reference group (usually the majority) and the focal group (usually
the minority). First, DIF across gender was examined separately in third- and fifth-
grade data with boys as the reference group and girls as the focal group. Next, to
detect DIF across ethnicity, the full samples of third- and fifth-grade data were each
divided into four groups with White being the reference group, and African American,
Hispanic, and Asian being the focal group for each of the reference-focal comparisons.
Finally, the fifth-grade full sample was specified as the focal group to be contrasted
with the third-grade full sample as the reference group for DIF. This phenomenon in
which item parameters may change over repeated administrations is also referred to as
item parameter drift (IPD; Goldstein, 1983). For the IPD detection where the data
Texas Tech University, Tianlan Wei, August 2014
58
were longitudinal (i.e., repeated measures) in nature, a special consideration was to
account for the fact that the observations were not independent. Thus far, limited
literature is available for addressing this issue, particularly in the use of the Wald χ2
test for detecting IPD. In the current study, the decision was made to use the sandwich
estimator that corrects the under-estimation of variance (Huber, 1967). A remarkable
advantage of the sandwich estimator is that it provides valid standard errors when the
assumed covariance model of repeated measures is incorrect. Further, the use of the
sandwich estimator is considered to be best suited to balanced longitudinal data with
relatively large sample size and relatively small number of repeated measures (Lipsitz
& Fitzmaurice, 2009), which are exactly the properties of the ECLS-K data utilized in
this study.
The Wald test was employed for determining significant DIF in polytomous
items with the alpha (α) value of .05. For the Wald test, flexMIRT prints each of the
Wald χ2 values with the associated degree of freedom and p-value. Because each DIF
analysis involved multiple items, such multiple testing would normally lead to
concerns of inflated Type I error. Interestingly, in examining the effects of multiple
testing adjustments in various DIF detection methods, Kim and Oshima (2013)
reported that the Type I error rate of the Wald test are well-controlled even before
adjustment. Therefore, multiple testing adjustment was not applied in the current
study. Following Woods’s (2009) recommendations, the DIF analysis of the current
study combined Langer’s (2008) two-stage procedure and Cai et al.’s (2011) one-stage
procedure to optimize the control of Type I error as well as the accuracy of multi-
group estimation. In particular, each DIF analysis comprised of three stages. In the
first stage, a model was fitted wherein the reference group mean and standard
deviation were fixed to 0 and 1, respectively, and the focal mean and standard
deviation were allowed to be freely estimated, with all item parameters constrained
equal between groups. In the second stage, a model was fitted with the focal mean
and standard deviation fixed to the values obtained in the first stage, and items with
non-significant Wald χ2 values were identified as anchors. In the third stage, the focal
Texas Tech University, Tianlan Wei, August 2014
59
mean and standard deviation were again freely estimated, with item parameters of the
anchor items being constrained equal between groups, while those of the studied items
were allowed to vary between groups.
To determine the impact of the DIF at the scale level, the test characteristic
curve (TCF) of each group was visually examined. The test characteristic curve is the
functional relation between the true score and the latent trait scale. Given any latent
trait level, the corresponding true score can be found via the test characteristic curve
(Baker, 1985, 2001). To define the test characteristic curve, an integer scoring
function (ISF) is defined that, for K = 4 (k = 0, 1, 2, and 3):
𝐸𝑖(𝜃) = �𝑘𝑃𝑘,𝑖
3
𝑘=1
= 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏1,𝑖�+
𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏2,𝑖�+
𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
1 + 𝑒𝑎𝑖�𝜃 − 𝑏3,𝑖�
(6)
where Ei(θ) is the expected score for a given polytomous item i with latent trait level θ.
Then, the test characteristic function is simply computed by summing up the ISF of
each item:
𝐸(𝜃) = �𝐸𝑖(𝜃)𝑁
𝑖=1
(7)
where E(θ) is the scale-level expected score with latent trait level θ, and N is the total
number of items in a scale. In the use the Wald test for DIF detection, it is particularly
necessary to examine the test characteristic curves because DIF of opposite directions
may cancel out, in which case the scale score may still be unbiased (Edelen, Thissen,
Teresi, Kleinman, & Ocepek-Welikson, 2006).
Texas Tech University, Tianlan Wei, August 2014
60
CHAPTER IV
RESULTS Prior to analyses, the data were first screened for missingness, multicollinearity
and univariate and multivariate outliers in SPSS v20. Table 4.1 summarizes the
ECLS-K variables utilized in the current study along with associated percentages of
missing data and descriptive statistics. In the full sample of 14,631 for the current
study, total missingness was limited to 11.03% of all cells.
Table 4.1
Descriptive Statistics of the ECLS-K Variables Included in the Study (N = 14,631)
Variable Name Description Missing% Min Max M SD
RACE Child Composite Race 0.13% GENDER Child Composite Gender 0.00% MCC5SDQ6 Wave 5 SDQ6 1.57% 1.00 4.00 3.06 0.99 MC5SDQ12 Wave 5 SDQ12 1.57% 1.00 4.00 2.81 1.09 MC5SDQ16 Wave 5 SDQ16 1.61% 1.00 4.00 3.22 0.89 MC5SDQ22 Wave 5 SDQ22 1.58% 1.00 4.00 3.19 0.98 MC5SDQ26 Wave 5 SDQ26 1.58% 1.00 4.00 2.91 1.04 MC5SDQ30 Wave 5 SDQ30 1.57% 1.00 4.00 3.32 0.99 MC5SDQ36 Wave 5 SDQ36 1.59% 1.00 4.00 3.25 0.97 MC5SDQ41 Wave 5 SDQ41 1.59% 1.00 4.00 3.27 0.89 MC6SDQ6 Wave 6 SDQ6 22.88% 1.00 4.00 2.90 0.96 MC6SDQ12 Wave 6 SDQ12 22.88% 1.00 4.00 2.47 1.02 MC6SDQ16 Wave 6 SDQ16 22.88% 1.00 4.00 3.07 0.88 MC6SDQ22 Wave 6 SDQ22 22.89% 1.00 4.00 3.00 0.97 MC6SDQ26 Wave 6 SDQ26 22.89% 1.00 4.00 2.76 0.97 MC6SDQ30 Wave 6 SDQ30 22.88% 1.00 4.00 3.07 1.01 MC6SDQ36 Wave 6 SDQ36 22.88% 1.00 4.00 2.99 0.98 MC6SDQ41 Wave 6 SDQ41 22.88% 1.00 4.00 3.08 0.89
C5R4MSCL Wave 5 Math IRT Scale Score 1.78% 34.56 166.25 98.73 24.71
C6R4MSCL Wave 6 Math IRT Scale Score 22.94% 50.86 170.66 123.69 24.79
Texas Tech University, Tianlan Wei, August 2014
61
Next, bivariate correlations were computed for each wave, and the results are
summarized in Tables 4.2 and 4.3. As shown in the tables, multicollinearity is
minimal among the eight items that only one correlation between SDQ30 and 36 in
fifth-grade data goes slightly beyond the .80 cut-off. Finally, using the |z| > 3 criterion
for univariate outliers and Cook’s distance > 1 criterion for multivariate outliers, no
cases were identified as influential cases.
Table 4.2
Bivariate Correlations among the SDQI Items in Third Grade
Item 1 2 3 4 5 6 7 8 1. SDQ6 -- 2. SDQ12 .50* -- 3. SDQ16 .54* .40* -- 4. SDQ22 .52* .63* .49* -- 5. SDQ26 .37* .29* .35* .32* -- 6. SDQ30 .54* .62* .47* .73* .32* -- 7. SDQ36 .54* .63* .50* .72* .33* .77* -- 8. SDQ41 .62* .50* .62* .58* .40* .62* .61* --
*p < .01.
Table 4.3
Bivariate Correlations among the SDQI Items in Fifth Grade
Item 1 2 3 4 5 6 7 8 1. SDQ6 -- 2. SDQ12 .52* -- 3. SDQ16 .63* .44* -- 4. SDQ22 .60* .68* .54* -- 5. SDQ26 .53* .41* .50* .46* -- 6. SDQ30 .60* .66* .53* .79* .46* -- 7. SDQ36 .59* .69* .52* .79* .47* .81* -- 8. SDQ41 .70* .54* .69* .65* .58* .67* .67* --
*p < .01.
Texas Tech University, Tianlan Wei, August 2014
62
Phase 1: Factor Structure, Reliability, Validity, and Group Differences
Factor Structure To determine whether the SDQI mathematics subscale satisfied the
unidimensionality assumption of IRT, factor analysis was performed separately on the
third-grade (Wave 5) and fifth-grade (Wave 6) data. EFA on third- and fifth-grade
data yielded similar results: consistent with what was indicated by the point of
inflexion in the scree plot, the analysis extracted one single factor with the eigenvalue
greater than 1. All eight items appeared to have salient factor loadings (λ ≥.40;
Gorsuch, 1997). The Kaiser-Meyer-Olkin (KMO) measure verified the sampling
adequacy for the analysis, KMO = .92 for third-grade and .93 for fifth-grade, which
are “great” according to Hutcheson and Sofroniou’s (1999) criteria. The one-factor
solutions explained 53.49% (Wave 5) and 60.46% (Wave 6) of the variance.
Turning to CFA, however, the one-factor CFA model did not fit the third-grade
data sufficiently well, χ2(20) = 5958.74, p < .0001, CFI = 0.97, TLI = 0.96, RMSEA
= .14, WRMR = 7.17. As such, modification indices (MIs; Sörbom, 1989) were
consulted and the source of poor-fitting was identified to be huge corrected errors
(Brown, 2006). For example, the correlated errors between SDQ16 and SDQ41
produced the largest Δχ2= 1738.27. A similar situation occurred when the one-factor
model was fit to the Wave 6 data, χ2(20) = 7826.73, p < .0001, CFI = 0.97, TLI = 0.96,
RMSEA = .19, WRMR = 9.01, also with considerable correlated errors among items.
Although the factor structure of the scale was not firmly supported by the CFA,
a decision was made to regard the single-factor structure of the scale based on several
considerations. First, the seven-component structure of the original SDQI has been
cross-validated on different samples, and the mathematics items consistently loaded
on one factor (Marsh, Relich, et al., 1983; Marsh, Smith, et al., 1983). It is highly
unlikely that the ECLS-K adaptations would invalidate the established factor structure
of the SDQI. Second, both the eigenvalue > 1 criterion and the scree plot in the EFA
framework were clearly in favor of the one-factor solution over any multiple-factor
solutions. Finally, though the RMSEA and WRMR went beyond the cut-off values,
Texas Tech University, Tianlan Wei, August 2014
63
the CFI and TLI were well above the cut-off of 0.95 in both CFA models. Given that
Hu and Bentler’s (1999) combinational rules are considered too stringent and more
appropriate when evaluating statistical significance rather than goodness of model fit
(Marsh, Hau, & Wen, 2004), it is reasonable to argue that no explicit separability is
evident in the scale. Table 4.4 presents the factor loadings of items as assessed by the
one-factor CFA.
Table 4.4
Internal Consistency Reliabilities and Factor Loadings by Item
Item # Third Grade Fifth Grade
Item-Total Correlation
α if Item Deleted
Factor Loading
Item-Total Correlation
α if Item Deleted
Factor Loading
SDQ6 .67 .88 .77 .73 .91 .81 SDQ12 .67 .88 .79 .69 .92 .81 SDQ16 .62 .89 .74 .67 .92 .78 SDQ22 .75 .87 .88 .81 .91 .91 SDQ26 .42 .91 .51 .59 .92 .66 SDQ30 .77 .87 .93 .81 .91 .93 SDQ36 .77 .87 .91 .81 .91 .92 SDQ41 .74 .87 .84 .80 .91 .88 α .89 .92
Internal Consistency Reliability Based on the one-factor structure of the eight items, the internal consistency
reliabilities were assessed using Cronbach’s alpha (α) method, and the scale overall
showed good to excellent internal consistencies (see Table 4.4). Among all items,
SDQ22 (“I am interested in math”), SDQ30 (“I like math”), SDQ36 (“I enjoy doing
work in math”), and SDQ41 (“I am good at math”) showed higher item-total
correlations across grade levels. SDQ26 (“I can do very difficult problems in math”)
stood out as it had particularly lower item-total correlations in both waves.
Texas Tech University, Tianlan Wei, August 2014
64
Criterion-Related Validity To determine the criterion-related validity of the scale, children’s mathematics
achievement scores (i.e., Variables C5R4MSCL and C6R4MSCL as shown in Table
4.1) were used as the outcome variables in the analyses. For third- and fifth-grade
data, a linear regression analysis was conducted each to evaluate the prediction of
children’s mathematics performance from the summed scores of the SDQI items. The
relationships were significant for both waves: R2 = .02, adjusted R2 = .02, p < .001 for
third grade and R2 = .06, adjusted R2 = .06, p < .001 for fifth grade. In either wave, the
SDQI summed score appeared to be a significant, positive predictor of children’s
performance, β = .14, p < .001 for third grade and β = .25, p < .001 for fifth grade.
These findings are consistent with the literature that academic interest has a positive
relationship with performance (e.g., Hidi, 2000; Hidi & Harackiewicz, 2000), so it
may be concluded that the SDQI exhibited sufficient criterion-related validity in
predicting children’s mathematics performance.
Group Differences To detect the group differences across gender and ethnicity groups as reflected
by the SDQI summed scores, a 2 × 4 (Gender × Ethnicity) factorial ANOVA was
conducted each on third and fifth-grade data. The ANOVA on the third-grade data
indicated a significant main effect for gender, F(1, 13573) = 128.12, p < .001, partial
η2 = .01, a significant main effect for ethnicity, F(3, 13573) = 26.27, p < .001, partial
η2 = .01, and a significant interaction between ethnicity and gender F(3, 13573) =
3.51, p = .02, partial η2 = .001. As assessed by the SDQI scale score, boys
demonstrated significantly higher mathematics interest (M = 25.75, SD = 5.77) than
girls (M = 24.27, SD = 6.09) across ethnicity in the third grade. In terms of ethnic
differences as assessed using the Hochberg’s GT2 post hoc comparison, both African
American (M = 25.77, SD = 5.99) and Hispanic children (M = 25.51, SD = 5.65)
demonstrated significantly higher levels of interest as compared with White children
(M = 24.67, SD = 6.08), both at p < .001, but no significant difference was found
between White and Asian children (M = 25.16, SD = 5.97; p = .09).
Texas Tech University, Tianlan Wei, August 2014
65
The ANOVA on the fifth-grade data yielded similar results regarding gender
differences but very different results regarding ethnic differences. The ANOVA
indicated a significant main effect for gender, F(1, 10637) = 83.89, p < .001, partial
η2 = .01 and a significant main effect for ethnicity, F(3, 10637) = 7.30, p < .001,
partial η2 = .002, but a non-significant interaction between ethnicity and gender F(3,
10637) = 2.53, p = .06, partial η2 = .001. As in third grade, fifth-grade boys
demonstrated significantly higher mathematics interest (M = 23.94, SD = 6.08) than
girls (M = 22.75, SD = 6.23) across ethnicity. In terms of ethnic differences as
assessed using the Hochberg’s GT2 post hoc comparison, Asian American children (M
= 24.10, SD = 5.67) scored significantly higher than White children (M = 23.21, SD =
6.12; p = .001), so were African American children (M = 23.73, SD = 6.65; p = .03).
However, no significant difference was found between White and Hispanic children
(M = 23.29, SD = 6.23; p = .997). Figure 4.1 illustrates the group differences across
the two waves.
Finally, results of the paired-samples t test indicate that children scored
significantly lower on the SDQ mathematics scale in fifth grade (M = 23.35, SD = 6.18)
than third grade (M = 24.98, SD = 5.94), t(11055) = 26.62, p < .001.
Figure 4.1. Group differences as assessed by the SDQI mathematics scale score.
Texas Tech University, Tianlan Wei, August 2014
66
Phase 2: Item Response Theory (IRT) Analysis The IRT analyses were conducted separately on the third and fifth-grade full
samples (N = 14,631) in flexMIRT v2. Table 4.5 presents the IRT parameter
estimates of the items by grade level, and Figures 4.2 and 4.3 illustrate the OCCs of
each item as for third and fifth-graders, respectively.
Table 4.5
IRT Parameter Estimates (a, b1, b2, b3) of Third- and Fifth-Grade Full Samples
Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. Third Gradea SDQ6 2.07 0.03 -1.80 0.03 -0.69 0.02 0.16 0.01 SDQ12 2.33 0.03 -1.27 0.02 -0.32 0.01 0.41 0.01 SDQ16 1.77 0.03 -2.36 0.04 -1.08 0.02 0.04 0.02 SDQ22 3.51 0.05 -1.59 0.02 -0.76 0.01 -0.07 0.01 SDQ26 1.04 0.02 -2.25 0.05 -0.75 0.02 0.57 0.02 SDQ30 4.81 0.08 -1.44 0.02 -0.86 0.01 -0.32 0.01 SDQ36 4.23 0.06 -1.55 0.02 -0.81 0.01 -0.16 0.01 SDQ41 2.71 0.04 -1.96 0.03 -0.99 0.02 -0.08 0.01 Fifth Gradeb SDQ6 2.28 0.04 -1.69 0.03 -0.53 0.02 0.53 0.02 SDQ12 2.48 0.04 -1.08 0.02 0.11 0.01 0.98 0.02 SDQ16 1.89 0.03 -2.25 0.04 -0.87 0.02 0.41 0.02 SDQ22 4.32 0.07 -1.50 0.02 -0.53 0.01 0.28 0.01 SDQ26 1.51 0.03 -1.87 0.03 -0.39 0.02 0.93 0.02 SDQ30 5.02 0.08 -1.35 0.02 -0.61 0.01 0.09 0.01 SDQ36 4.68 0.07 -1.43 0.02 -0.53 0.01 0.26 0.01 SDQ41 3.04 0.05 -1.87 0.03 -0.74 0.02 0.32 0.01
a -2loglikelihood = 219576.39, AIC = 219640.39, BIC = 219883.30, G2(4346) =
27264.39, p < .0001, F0hat = 1.86, RMSEA = .02.
b -2loglikelihood = 172007.52, AIC = 172071.52, BIC = 172314.43, G2(3143) =
26971.03, p < .0001, F0hat = 1.84, RMSEA = .02.
Texas Tech University, Tianlan Wei, August 2014
67
Figure 4.2. Option characteristic curves of the items in third grade.
Texas Tech University, Tianlan Wei, August 2014
68
Figure 4.3. Option characteristic curves of the items in fifth grade.
Texas Tech University, Tianlan Wei, August 2014
69
For third-grade data, all eight items demonstrated “very high” discrimination
according to Baker’s criteria (1985, 2001; a > 1.7) except SDQ26 (“I can do very
difficult problems in math”) which had “moderate” discrimination. The average item
thresholds were -1.78 for b1 (SD = 0.36), -0.78 for b2 (SD = 0.21), and 0.07 for b3 (SD
= 0.28). For fifth-grade data, all items again demonstrated “very high” discrimination
except SDQ26 which showed “good” discrimination. The average item thresholds
were -1.63 for b1 (SD = 0.34), -0.51 for b2 (SD = 0.27), and 0.48 for b3 (SD = 0.30).
Thus, it appears that the scale had fairly low item thresholds in assessing third graders’
mathematics interest. In particular, these items seemed incapable of obtaining much
information of children whose interest levels were above the mean (i.e., θ > 0). From
the IRT perspective, these items are of better psychometric qualities in assessing fifth
rather than third-graders’ interest levels. Such comparisons are illustrated in Figure
4.4 where the item information function curves for each item across grade levels are
combined in one graph. In general, an item covers more area under the curve (AUC)
when administered to fifth- than third-graders, particularly to the right side of the
latent trait (θ) continuum. As a result, the scale in total also provides more
information for those who go beyond the mean in their interest levels, as illustrated by
the test information function curves in Figure 4.5. Nonetheless, the scale appears
unbalanced in terms of thresholds even for fifth graders, because it provides sufficient
information from θ = -2 through 0, but little information for θ to go beyond 1.
In terms of item-level characteristics, three out of the eight items appear to
provide particularly more information across waves: SDQ30 (“I like math”), SDQ36
(“I enjoy doing work in math”), and SDQ22 (“I am interested in math”), while three
items of the cognitive domain (i.e., SDQ26 “I can do very difficult problems in math,”
SDQ16 “I get good grades in math,” and SDQ6 “work in math is easy for me”)
provide much less information when examined using the same metrics (see Figure 4.4).
Comparatively, SDQ12 (“I cannot wait to do math each day”) of the affective domain
and SDQ 41 (“I am good at math”) of the cognitive domain contain moderate levels of
information.
Texas Tech University, Tianlan Wei, August 2014
70
Figure 4.4. Item information function (IIF) curves by item across grade levels.
Texas Tech University, Tianlan Wei, August 2014
71
Figure 4.5. Test information function curves by scale across grade levels.
Phase 3: Differential Item Functioning (DIF) Analysis A series of DIF analyses were performed using the IRT-based Lord’s Wald test
in flexMIRT v2, and the results of these analyses are summarized in Table 4.6.
Because multi-testing adjustment was considered to be unnecessary for the Wald test,
the significance of DIF is determined by the Wald χ2 with 4 degrees of freedom. For
DIF across gender, all eight items demonstrated significant DIF in third grade while
five of them had DIF in fifth grade. For DIF across ethnicity with White as the
reference group, all items demonstrated significant DIF where the focal group was
African American or Hispanic in both grade levels, but the DIF was relatively
moderate when Asian was the focal group. In addition, all eight items demonstrated
significant item parameter drift (IPD) over time.
Texas Tech University, Tianlan Wei, August 2014
72
Table 4.6
Summary of the DIF Results by Gender, Ethnicity, and Age
Item
Lord's Wald Test χ2(df = 4)
Male vs. Female
White vs. African
American
White vs.
Hispanic White
vs. Asian Item
Parameter Drift
Third Grade Third vs.
Fifth SDQ6 86.1*** 87.4*** 56.5*** 11.2* 189.7*** SDQ12 101.4*** 151.9*** 115.6*** 10.2* 255.5*** SDQ16 15.4** 88.6*** 87.8*** 43.2*** 102.5*** SDQ22 25.7*** 322.1*** 75.4*** 2.1 191.5*** SDQ26 100.0*** 196.0*** 28.4*** 26.4*** 449.3*** SDQ30 20.0*** 97.2*** 43.8*** 1.8 138.6*** SDQ36 53.8*** 186.4*** 76.9*** 9.5* 139.5*** SDQ41 47.7*** 128.4*** 46.3*** 19.9*** 133.8***
Fifth Grade SDQ6
98.3*** 73.3*** 25.4*** 3.3
SDQ12
77.5*** 60.9*** 53.0*** 14.9** SDQ16
6.1 79.3*** 124.4*** 30.5***
SDQ22
6.9 91.4*** 43.0*** 1.2 SDQ26
115.7*** 123.2*** 36.7*** 10.7*
SDQ30
0.2 115.0*** 13.5** 0.8 SDQ36
9.8* 139.6*** 28.3*** 7.4
SDQ41 36.4*** 60.2*** 44.1*** 16.0** *p < .05. **p < .01. ***p < .001.
DIF across Gender To detect DIF across gender, the full samples of third and fifth-grade data were
each divided into two groups according to gender. For third graders, no item qualified
as the anchor, so all items were constrained equal between groups. Table 4.7
summarizes the IRT estimates of each group, and Figure 4.6 illustrates the DIF across
gender by item. All eight items showed significant DIF that varied in direction: SDQ6,
16, 26, and 41 were in favor of the reference group while SDQ12, 22, and 36 were in
Texas Tech University, Tianlan Wei, August 2014
73
favor of the focal group. In other words, boys were more likely than girls to endorse
higher-order options of items such as SDQ6 (“Work in math is easy for me”) even
given the same level of the latent trait, while girls were more likely to endorse higher-
order options of items such as SDQ12 (“I cannot wait to do math each day”). The DIF
of SDQ30 appears to be minimal between groups because the DIF in b1 and b2 tended
to cancel out. Interestingly, these DIF results are domain-specific: cognitive (i.e.,
perceived competence) items were consistently in favor of boys, while affective (i.e.,
interest, liking, and positive emotion) items were consistently in favor of girls. On
average, these items showed higher discrimination for girls (Ma = 3.00, SDa = 1.33)
than boys (Ma = 2.75, SDa = 1.14). As shown in Figure 4.6, the DIF of some items
(e.g., SDQ12) was nonuniform because it appeared to be a function of invariance in
item discrimination (a) as well as item thresholds (b).
Texas Tech University, Tianlan Wei, August 2014
74
Table 4.7
Parameter Estimates of Multigroup (Male vs. Female) IRT in Third Grade
Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. Male SDQ6 2.01 0.05 -1.99 0.04 -0.95 0.02 -0.06 0.02 SDQ12 2.22 0.05 -1.31 0.03 -0.37 0.02 0.36 0.02 SDQ16 1.82 0.04 -2.39 0.06 -1.22 0.03 -0.13 0.02 SDQ22 3.47 0.07 -1.64 0.03 -0.86 0.02 -0.17 0.02 SDQ26 1.05 0.03 -2.43 0.07 -1.04 0.04 0.25 0.03 SDQ30 4.64 0.11 -1.52 0.03 -0.98 0.02 -0.44 0.02 SDQ36 4.07 0.09 -1.58 0.03 -0.91 0.02 -0.27 0.02 SDQ41 2.69 0.06 -2.07 0.04 -1.16 0.02 -0.28 0.02 Female SDQ6 2.19 0.05 -1.82 0.03 -0.69 0.02 0.12 0.02 SDQ12 2.64 0.05 -1.42 0.02 -0.51 0.02 0.18 0.02 SDQ16 1.77 0.04 -2.49 0.05 -1.16 0.03 -0.04 0.02 SDQ22 3.74 0.07 -1.72 0.03 -0.88 0.02 -0.21 0.02 SDQ26 1.04 0.03 -2.3 0.06 -0.71 0.03 0.64 0.04 SDQ30 5.17 0.12 -1.54 0.02 -0.96 0.02 -0.44 0.01 SDQ36 4.64 0.1 -1.68 0.02 -0.93 0.02 -0.31 0.01 SDQ41 2.82 0.06 -2.03 0.03 -1.05 0.02 -0.12 0.02
Note. -2loglikelihood = 218656.60, AIC = 218784.60, BIC = 219270.42, G2(5339) =
32825.73, p < .0001, F0hat = 2.24, RMSEA = .02.
Texas Tech University, Tianlan Wei, August 2014
75
Figure 4.6. Option characteristic curves of multigroup (male vs. female) IRT in third
grade.
Texas Tech University, Tianlan Wei, August 2014
76
For fifth grade, SDQ16, 22, and 30 were specified as the anchor items, and
other items were freely estimated in the multigroup IRT. Table 4.8 summarizes the
IRT estimates of both groups, and Figure 4.7 illustrates the DIF across gender by item.
Five of the eight items showed significant DIF that varied in direction: SDQ6, 26, and
41 were in favor of boys, while SDQ12 and 36 were in favor of girls. As in third
grade, the DIF still appeared to be domain-specific that cognitive items were
consistently in favor of boys, while affective items were consistently in favor of girls.
On the other hand, SDQ16 (“I get good grades in math”), 22 (“I am interested in
math”), and 30 (“I like math”) became invariant across gender in fifth grade.
Table 4.8
Parameter Estimates of Multigroup (Male vs. Female) IRT in Fifth Grade
Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. Male SDQ6 2.32 0.05 -1.82 0.04 -0.74 0.02 0.32 0.02 SDQ12 2.37 0.06 -1.08 0.03 0.11 0.02 1.01 0.03 SDQ16 1.92 0.05 -2.34 0.06 -0.97 0.03 0.29 0.02 SDQ22 4.32 0.10 -1.54 0.03 -0.61 0.02 0.21 0.02 SDQ26 1.53 0.04 -2.07 0.05 -0.64 0.03 0.67 0.03 SDQ30 5.10 0.12 -1.41 0.03 -0.68 0.02 0.01 0.02 SDQ36 4.63 0.10 -1.46 0.03 -0.60 0.02 0.19 0.02 SDQ41 3.03 0.08 -1.95 0.04 -0.88 0.02 0.16 0.02 Female SDQ6 2.32 0.05 -1.69 0.04 -0.47 0.02 0.58 0.02 SDQ12 2.80 0.06 -1.19 0.03 -0.05 0.02 0.76 0.02 SDQ16 1.90 0.05 -2.26 0.05 -0.91 0.03 0.36 0.03 SDQ22 4.48 0.10 -1.57 0.03 -0.60 0.02 0.18 0.02 SDQ26 1.53 0.04 -1.81 0.04 -0.30 0.03 1.03 0.04 SDQ30 5.09 0.12 -1.41 0.02 -0.69 0.02 0.01 0.02 SDQ36 4.93 0.11 -1.51 0.03 -0.61 0.02 0.16 0.02 SDQ41 3.14 0.07 -1.89 0.04 -0.75 0.02 0.32 0.02
Note. -2loglikelihood = 171225.31, AIC = 171353.31, BIC = 171839.13, G2(3951) =
32165.94, p < .0001, F0hat = 2.20, RMSEA = .02.
Texas Tech University, Tianlan Wei, August 2014
77
Figure 4.7. Option characteristic curves of multigroup (male vs. female) IRT in fifth
grade.
Texas Tech University, Tianlan Wei, August 2014
78
Finally, Figure 4.8 illustrates the test characteristic curves for the two groups in
each grade level.
Figure 4.8. Test characteristic function curves of multi-group (male vs. female) IRT.
DIF across Ethnicity Table 4.9 summarizes the IRT estimates of each ethnic group in third grade.
For third-grade White versus African American comparison (see Figure 4.9), no item
qualified as the anchor, so all items were constrained to be equal between groups. As
shown in Table 4.5, all items demonstrated significant DIF. In particular, the item
thresholds (i.e., b1, b2, and b3) of SDQ12, 26, 30, and 36 were consistently lower for
the focal group than the reference group, indicating that African American children
were more likely than their White counterparts to endorse higher-order options of
these items, while the item thresholds of SDQ16 were consistently lower for the White
group. For SDQ6, 22, and 41, the White group showed lower b1 and b2 but higher b3,
indicating that these items encompassed wider ranges along the θ continuum for White
respondents than African American ones. Adding to the complexity is that the SDQI
items on average showed much lower discrimination (Ma = 2.32, SDa = 0.87) among
African American respondents than among their White counterparts (Ma = 3.06, SDa =
1.38), resulting in the mostly nonuniform DIF among items (see Figure 4.9).
Texas Tech University, Tianlan Wei, August 2014
79
For third-grade White versus Hispanic comparison (see Figure 4.10), no item
qualified as the anchor, so all items were constrained equal between groups. Again,
all eight items showed significant DIF. In particular, the item thresholds of SDQ12,
22, 30, and 36 were consistently lower for Hispanic respondents than White
respondents, while the item thresholds of SDQ16, 26, and 41were consistently lower
for the White group. For SDQ6, the White group had lower b1 and b2 but higher b3.
As in the White versus African American comparison, the SDQI items on average had
lower discrimination (Ma = 2.66, SDa = 1.02) among Hispanic respondents than
among their White counterparts (Ma = 3.06, SDa = 1.38). As shown in Figure 4.10, the
DIF of items was mostly nonuniform.
For third-grade White versus Asian American comparison (see Figure 4.11),
SDQ6, 22, and 30 were specified as the anchor items, and other items were freely
estimated between groups. Six of the eight items demonstrated significant DIF. In
particular, the item thresholds of SDQ12 and 36 were consistently lower for Asian
American respondents, while the item thresholds of SDQ16, 26, and 41 were
consistently lower for White respondents. For SDQ6, the White group had lower b1
and b2, but higher b3. Unlike that of previous comparisons, the items on average had a
similar level of discrimination (Ma = 3.14, SDa = 1.18) among Asian American
respondents as compared with their White counterparts (Ma = 3.06, SDa = 1.38). The
DIF of some items (e.g., SDQ26) still appeared to be nonuniform because of the joint
influence of DIF in a and b parameters.
Texas Tech University, Tianlan Wei, August 2014
80
Table 4.9 Parameter Estimates of Multigroup (White vs. African American, White vs. Hispanic, and White vs. Asian American) IRT in Third Grade Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. White SDQ6 2.14 0.04 -1.77 0.03 -0.67 0.02 0.26 0.02 SDQ12 2.57 0.05 -1.14 0.02 -0.17 0.02 0.55 0.02 SDQ16 1.77 0.04 -2.53 0.06 -1.17 0.03 0.04 0.02 SDQ22 4.15 0.08 -1.50 0.02 -0.68 0.02 0.01 0.01 SDQ26 1.13 0.03 -2.16 0.06 -0.70 0.03 0.62 0.03 SDQ30 5.22 0.11 -1.34 0.02 -0.77 0.02 -0.21 0.01 SDQ36 4.76 0.09 -1.45 0.02 -0.72 0.02 -0.05 0.01 SDQ41 2.76 0.06 -1.98 0.04 -1.01 0.02 -0.01 0.02 African Americana SDQ6 1.96 0.09 -1.5 0.07 -0.54 0.05 0.14 0.04 SDQ12 1.97 0.09 -1.21 0.06 -0.37 0.04 0.29 0.04 SDQ16 1.83 0.08 -1.85 0.09 -0.78 0.05 0.09 0.04 SDQ22 2.46 0.10 -1.47 0.07 -0.67 0.04 -0.03 0.04 SDQ26 0.71 0.05 -2.42 0.19 -0.92 0.10 0.50 0.08 SDQ30 3.80 0.19 -1.37 0.06 -0.79 0.04 -0.33 0.04 SDQ36 3.10 0.14 -1.52 0.07 -0.82 0.04 -0.21 0.04 SDQ41 2.70 0.13 -1.53 0.07 -0.74 0.05 -0.10 0.04 Hispanicb SDQ6 2.08 0.08 -1.68 0.06 -0.52 0.03 0.19 0.03 SDQ12 2.05 0.08 -1.31 0.05 -0.40 0.03 0.35 0.03 SDQ16 1.92 0.08 -1.94 0.08 -0.73 0.04 0.23 0.03 SDQ22 3.20 0.12 -1.52 0.05 -0.71 0.03 -0.02 0.03 SDQ26 1.01 0.05 -2.12 0.12 -0.62 0.06 0.64 0.05 SDQ30 4.31 0.18 -1.41 0.05 -0.81 0.03 -0.30 0.03 SDQ36 3.80 0.14 -1.46 0.05 -0.75 0.03 -0.14 0.03 SDQ41 2.90 0.11 -1.75 0.06 -0.75 0.03 0.06 0.03 Asian Americanc SDQ6 2.34 0.15 -1.64 0.10 -0.50 0.05 0.25 0.05 SDQ12 2.40 0.14 -1.31 0.08 -0.32 0.05 0.48 0.05 SDQ16 2.21 0.14 -1.77 0.12 -0.68 0.06 0.28 0.05 SDQ22 3.88 0.23 -1.55 0.08 -0.67 0.05 0.02 0.04 SDQ26 1.48 0.10 -1.80 0.14 -0.37 0.06 0.67 0.07 SDQ30 5.23 0.38 -1.40 0.07 -0.78 0.05 -0.24 0.04 SDQ36 4.32 0.27 -1.46 0.08 -0.76 0.05 -0.12 0.04 SDQ41 3.27 0.20 -1.70 0.10 -0.73 0.05 0.06 0.04
a-2loglikelihood = 150540.10, AIC = 150668.10, BIC = 151130.48, G2(3676) = 23806.61, p < .0001, F0hat = 2.35, RMSEA = .02. b-2loglikelihood = 162351.31, AIC = 162479.31, BIC = 162946.26, G2(4014) = 25221.42, p < .0001, F0hat = 2.32, RMSEA = .02. c-2loglikelihood = 137068.92, AIC = 137196.92, BIC = 137653.07, G2(3268) = 20830.12, p < .0001, F0hat = 2.26, RMSEA = .02.
Texas Tech University, Tianlan Wei, August 2014
81
Figure 4.9. Option characteristic curves of multigroup (White vs. African American
[AA]) IRT in third grade.
Texas Tech University, Tianlan Wei, August 2014
82
Figure 4.10. Option characteristic curves of multigroup (White vs. Hispanic) IRT in
third grade.
Texas Tech University, Tianlan Wei, August 2014
83
Figure 4.11. Option characteristic curves of multigroup (White vs. Asian American
[Asian]) IRT in third grade.
Texas Tech University, Tianlan Wei, August 2014
84
Table 4.10 summarizes the IRT estimates of each ethnic group in fifth grade.
For fifth-grade White versus African American comparison (see Figure 4.12), no item
qualified as the anchor, and all items showed significant DIF. In particular, the item
thresholds (i.e., b1, b2, and b3) of SDQ12, 22, and 36 were consistently lower for the
African American respondents, while the item thresholds of SDQ6 and 16 were
consistently lower for the White group. On the other hand, SDQ26, 30, and 41 had
lower b1 (and b2) but higher (b2 and) b3 for the reference group, indicating that these
items encompassed wider ranges of the θ continuum for White than African American
respondents. In addition, the eight items on average showed much lower
discrimination (Ma = 2.54, SDa = 0.89) among African American respondents as
compared with their White counterparts (Ma = 3.27, SDa = 1.37).
For fifth-grade White versus Hispanic comparison (see Figure 4.13), no item
qualified as the anchor, and again all items showed significant DIF. In particular, the
item thresholds (b1, b2, b3) of SDQ12, 22, 30, and 36 were consistently lower for
Hispanic than White respondents, while the item thresholds of SDQ6, 16, 26, and 41
were consistently lower for the White group. In terms of item discrimination, the
SDQI items on average showed a similar level of discrimination (Ma = 3.10, SDa =
1.18) among Hispanic respondents as compared with their White counterparts (Ma =
3.27, SDa = 1.37).
For fifth-grade White versus Asian American comparison (see Figure 4.14),
SDQ6, 22, 26, 30 and 36 were specified as the anchor items, and four items showed
significant DIF. In particular, the item thresholds (b1, b2, b3) of SDQ12 were
consistently lower for Asian than White respondents, while the item thresholds of
SDQ16, 26, and 41 are consistently lower for the White group. In terms of item
discrimination, the items on average show similar levels of discrimination (Ma = 3.33,
SDa = 1.22) among Asian respondents as compared with their White counterparts (Ma
= 3.27, SDa = 1.37).
Texas Tech University, Tianlan Wei, August 2014
85
Table 4.10 Parameter Estimates of Multigroup (White vs. African American, White vs. Hispanic, and White vs. Asian American) IRT in Fifth Grade Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. White SDQ6 2.29 0.05 -1.76 0.04 -0.57 0.02 0.56 0.02 SDQ12 2.62 0.06 -0.98 0.02 0.23 0.02 1.12 0.02 SDQ16 1.87 0.04 -2.46 0.06 -1.04 0.03 0.36 0.02 SDQ22 4.63 0.10 -1.46 0.03 -0.49 0.02 0.35 0.02 SDQ26 1.58 0.04 -1.90 0.05 -0.43 0.02 0.95 0.03 SDQ30 5.22 0.11 -1.32 0.02 -0.56 0.02 0.18 0.02 SDQ36 5.03 0.11 -1.36 0.02 -0.47 0.02 0.36 0.02 SDQ41 2.99 0.07 -1.97 0.04 -0.82 0.02 0.34 0.02 African Americana SDQ6 2.05 0.10 -1.44 0.08 -0.28 0.05 0.59 0.05 SDQ12 1.96 0.10 -1.12 0.07 0.10 0.05 1.01 0.06 SDQ16 1.72 0.09 -2.06 0.11 -0.61 0.06 0.48 0.06 SDQ22 3.37 0.18 -1.52 0.07 -0.51 0.05 0.21 0.05 SDQ26 1.14 0.06 -1.90 0.12 -0.43 0.07 0.86 0.08 SDQ30 3.76 0.19 -1.30 0.06 -0.62 0.05 0.04 0.04 SDQ36 3.46 0.17 -1.43 0.07 -0.54 0.05 0.19 0.05 SDQ41 2.91 0.15 -1.63 0.08 -0.55 0.05 0.31 0.05 Hispanicb SDQ6 2.29 0.09 -1.56 0.06 -0.37 0.04 0.64 0.04 SDQ12 2.36 0.09 -1.20 0.05 0.00 0.03 0.92 0.04 SDQ16 2.11 0.08 -1.81 0.07 -0.50 0.04 0.68 0.04 SDQ22 3.70 0.13 -1.56 0.05 -0.54 0.03 0.31 0.03 SDQ26 1.47 0.06 -1.73 0.08 -0.18 0.04 1.12 0.06 SDQ30 5.07 0.20 -1.38 0.05 -0.60 0.03 0.08 0.03 SDQ36 4.54 0.17 -1.53 0.05 -0.56 0.03 0.23 0.03 SDQ41 3.27 0.13 -1.69 0.06 -0.54 0.03 0.48 0.03 Asian Americanc SDQ6 2.50 0.16 -1.65 0.12 -0.47 0.06 0.61 0.06 SDQ12 2.43 0.16 -1.21 0.09 0.12 0.05 0.96 0.06 SDQ16 2.34 0.16 -1.89 0.14 -0.58 0.06 0.52 0.06 SDQ22 4.72 0.29 -1.47 0.09 -0.45 0.05 0.34 0.04 SDQ26 1.61 0.11 -1.64 0.14 -0.21 0.07 1.05 0.08 SDQ30 4.93 0.32 -1.35 0.08 -0.57 0.05 0.18 0.04 SDQ36 4.75 0.31 -1.48 0.09 -0.48 0.05 0.29 0.04 SDQ41 3.35 0.23 -1.74 0.12 -0.57 0.06 0.48 0.05
a-2loglikelihood = 116548.76, AIC = 116676.76, BIC = 117139.13, G2(2725) = 23693.87, p < .0001, F0hat = 2.34, RMSEA = .03. b-2loglikelihood = 128842.10, AIC = 128970.10, BIC = 129437.05, G2(3017) = 24021.21, p < .0001, F0hat = 2.20, RMSEA = .03. c-2loglikelihood = 108189.86, AIC = 108317.86, BIC = 108774.01, G2(2426) = 19911.69, p < .0001, F0hat = 2.16, RMSEA = .03.
Texas Tech University, Tianlan Wei, August 2014
86
Figure 4.12. Option characteristic curves of multigroup (White vs. African American)
IRT in fifth grade.
Texas Tech University, Tianlan Wei, August 2014
87
Figure 4.13. Option characteristic curves of multigroup (White vs. Hispanic) IRT in
fifth grade.
Texas Tech University, Tianlan Wei, August 2014
88
Figure 4.14. Option Characteristic Curves of multi-group (White vs. Asian American
[Asian]) IRT in fifth grade.
Texas Tech University, Tianlan Wei, August 2014
89
Finally, Figure 4.15 illustrates the test characteristic curves for the two-group
comparisons in each grade level.
Figure 4.15. Test characteristic function curves of multigroup (White vs. African
American, White vs. Hispanic, and White vs. Asian American) IRT.
Texas Tech University, Tianlan Wei, August 2014
90
Item Parameter Drift To detect item parameter drift (IPD) from third to fifth-grade, the fifth-grade
full sample data were contrasted with the third-grade full sample data using the
sandwich estimator for standard errors. Table 4.11 summarizes the IRT estimates of
each wave of measurement.
Table 4.11
Parameter Estimates of Item Parameter Drift Analysis
Item ai s.e. b1,i s.e. b2,i s.e. b3,i s.e. Third Grade SDQ6 2.07 0.04 -1.80 0.04 -0.69 0.03 0.16 0.02 SDQ12 2.33 0.04 -1.27 0.03 -0.32 0.02 0.41 0.02 SDQ16 1.77 0.03 -2.36 0.04 -1.08 0.03 0.04 0.02 SDQ22 3.51 0.07 -1.59 0.03 -0.76 0.03 -0.07 0.02 SDQ26 1.04 0.02 -2.25 0.05 -0.75 0.03 0.57 0.03 SDQ30 4.81 0.13 -1.44 0.03 -0.86 0.03 -0.32 0.02 SDQ36 4.23 0.10 -1.55 0.03 -0.81 0.03 -0.16 0.02 SDQ41 2.71 0.05 -1.96 0.03 -0.99 0.03 -0.08 0.02 Fifth Grade SDQ6 2.50 0.62 -1.85 0.17 -0.79 0.06 0.18 0.26 SDQ12 2.72 0.72 -1.30 0.05 -0.20 0.18 0.59 0.33 SDQ16 2.07 0.50 -2.36 0.29 -1.11 0.02 0.07 0.24 SDQ22 4.74 1.46 -1.68 0.12 -0.79 0.07 -0.05 0.23 SDQ26 1.65 0.34 -2.01 0.20 -0.66 0.08 0.54 0.30 SDQ30 5.50 2.08 -1.54 0.10 -0.87 0.06 -0.22 0.20 SDQ36 5.13 1.71 -1.61 0.11 -0.79 0.07 -0.07 0.23 SDQ41 3.33 0.63 -2.01 0.19 -0.98 0.04 -0.01 0.22
Note. -2loglikelihood = 391583.97, AIC = 391711.97, BIC = 392242.15, G2(7490) =
54235.48, p < .0001, F0hat = 1.85, RMSEA = .01.
For the IPD analysis (see Figures 4.16 and 4.17), no item qualified as the
anchor, and all items showed significant DIF (see Table 4.5). All items showed higher
discrimination in fifth grade, resulting in a drift of the a parameter from Ma = 2.81
(SDa = 1.20) in third grade to Ma = 3.46 (SDa = 1.38) in fifth grade. In terms of item
Texas Tech University, Tianlan Wei, August 2014
91
thresholds, all items except SDQ26 had lower b1 (and b2) and higher (b2 and) b3 in
fifth grade, indicating that most of the items encompassed wider ranges of the θ
continuum for fifth than third-graders. SDQ26 stood out because it actually covered a
narrower range of the continuum when administered in fifth grade. On average, b1
drifted from -1.78 (SDb1 = 0.36) in third grade to -1.80 (SDb1 = 0.31) in fifth grade, b2
from -0.78 (SDb2 = 0.21) to -0.77 (SDb2 = 0.25), and b3 from 0.07 (SDb3 = 0.28) to
0.13(SDb3 = 0.27). Clearly, the nonuniform drifts of b parameters together with the
drift of a parameter resulted in nonuniform DIF in all items (see Figure 4.16).
Texas Tech University, Tianlan Wei, August 2014
92
Figure 4.16. Option characteristic curves of item parameter drift (third- vs. fifth-
grade).
Texas Tech University, Tianlan Wei, August 2014
93
Figure 4.17. Test characteristic function curves of item parameter drift.
Texas Tech University, Tianlan Wei, August 2014
94
CHAPTER V
DISCUSSION
The Psychometric Properties of the SDQI Mathematics Subscale
The Classical Test Theory Perspective In this dissertation, the psychometric properties of the SDQI mathematics
subscale was examined from the classical test theory (CTT), the factor analytic, and
the item response theory (IRT) perspectives to enhance our understandings of
academic interest and affect, particularly in the mathematics domain. To address this,
an important consideration was how the SDQI, an instrument initially developed based
on the framework of academic self-concept (Shavelson & Bolus, 1981; Shavelson et
al., 1976), could be evaluated as a measure of interest. As discussed in Chapter 3,
self-concept is a broad concept that refers to organized, multifaceted perception of
one’s self (Shavelson et al., 1976). This multifaceted nature of self-concept allows
researchers to examine a single aspect of it (e.g., interest/affect) without disregarding
the conceptualization of self-concept. The ECLS-K adapted SDQI mathematics scale
consists of eight items, four of which being affective. In line with the
operationalization of academic interest as discussed in Chapter 2, the four affective
items either assess interest directly (SDQ22—“I am interested in math”) or reflect
some certain aspects of interest such as motivation (SDQ12—“I cannot wait to do
math each day”), liking (SDQ30—“I like math”), and positive emotion (SDQ36—“I
enjoy doing work in math”). In search of a good measure of academic interest, the
SDQI affective items appear to offer adequate face validity of relevant investigations.
Another important consideration was, therefore, whether the four cognitive
items (i.e., SDQ6, 16, 26, and 41) should also be included in the investigation. In
terms of face validity, these cognitive items actually focus on perceived competence
(e.g., SDQ6—“Work in math is easy for me”) which plays a central role in the
formation of academic self-concept (Bong & Skaalvik, 2003). The inclusion of these
items could have confounded the investigation if they happened to underlie a
Texas Tech University, Tianlan Wei, August 2014
95
distinguishable dimension other than interest and affect. Because there has been a lot
of debate in the literature as to whether the affective domain was separable from the
cognitive domain (Bong & Skaalvik, 2003), it was unlikely to obtain a widely
acknowledged theoretical basis for the investigations of this study. Hence, factor
analysis was conducted first to provide an empirical basis for the subsequent CTT and
IRT analyses.
The results of factor analysis indicate a relatively loose one-factor structure of
the eight SDQI items. Using the eigenvalue > 1 criterion, one single factor was
extracted as a result of exploratory factor analysis (EFA). This one-factor structure,
however, was not strongly supported by the confirmatory factor analysis (CFA).
Although EFA and CFA often rely on the same estimation methods, they also differ in
the manner by which cross-loadings are handled. More importantly, the CFA
framework offers researchers the ability to specify the nature of relationships among
the measurement errors of the items (Brown, 2006). In regards to the current
investigation, these features of CFA offer at least two explanations for the relatively
poor fit of the one-factor structure. First, it is possible that the eight items tended to
load on multiple factors (e.g., affective vs. cognitive). In this case, poor fit might have
emerged in CFA because the identification restrictions associated with CFA are
achieved in part by fixing the cross-loadings to zero (Brown, 2006). Second, the poor
fit might be caused by correlated errors among items, and the model fit could have
been notably improved once these errors were allowed to correlate. In fact, the
modification indices did suggest very large correlated errors among the SDQI items.
However, correlated errors were not modeled in the current study to avoid model
overfitting in the absence of a theoretical basis. Likewise, multiple-factor solutions
were not tested in CFA due to a lack of empirical basis (i.e., EFA).
From a theoretical perspective, this loose one-factor solution may be a function
of the conceptual fuzziness surrounding the concepts of interest, affect, and perceived
competence. Specifically, perceived competence is closely related to academic
interest and even considered to have a reciprocal relationship with it (Marsh et al.,
Texas Tech University, Tianlan Wei, August 2014
96
2005; Wigfield & Eccles, 2002). As such, perceived competence may act as the
indicator of interest and affect, and vice versa. This may bring challenges to the
current investigation because the scale may be interpreted as either a measure of
interest with perceived competence items as its cognitive indicators, or a measure of
perceived competence with interest, liking, and positive affect as its affective
indicators. The criterion-related validity of the scale, as established in this study, does
not seem to address this issue either, given that both perceived competence/self-
efficacy and interest have been identified as positive predictors of academic
performance (Hidi, 2000; Hidi & Harackiewicz, 2000; Klassen & Usher, 2010).
Hence, the item-level characteristics can be particularly helpful for us to gain a better
understanding of the scale.
The item-level characteristics as revealed by the CTT method indicate that the
eight items are of similarly good qualities. Among them, SDQ26 (“I can do very
difficult problems in math”) appears to be weaker than others as assessed by the item-
total correlations as well as Cronbach’s α if item deleted (see Table 4.4). Four items
demonstrate higher item-total correlations across grade levels: SDQ22 (“I am
interested in math”), SDQ30 (“I like math”), SDQ36 (“I enjoy doing work in math”),
and SDQ41 (“I am good at math”). In the factor analytic framework, the four items
also show higher factor loadings (.84 to .93 in third grade; .88 to .93 in fifth grade)
than others (.51 to 79 in third grade; .66 to .81 in fifth grade). It would be premature
though to conclude that this set of items represent the key concept of the SDQI as they
do not surpass others (i.e., SDQ6, SDQ12, and SDQ16) much in their CTT-based
psychometric qualities (see Table 4.4).
From the CTT perspective, the SDQI mathematics subscale demonstrates
sufficient internal reliability, predictive validity, and item-level qualities; however, the
CTT-based evaluation does not adequately address the dimensionality issue of the
scale, particularly whether the scale is primarily measuring perceived competence or
interest. The item response theory (IRT) based evaluation was thus critical.
Texas Tech University, Tianlan Wei, August 2014
97
The Item Response Theory Perspective With many advantages over the CTT, the IRT-based evaluation largely
strengthens the item-level findings revealed by the CTT-based method. In the IRT
framework, the same set of items (i.e., SDQ22, 30, 36, and 41) tends to demonstrate
better psychometric qualities than others as indicated by their item discriminations (a).
Specifically, SDQ22, 30, and 36 all have a values above 3 in third grade and a values
above 4 in fifth grade. Because the item information function is proportional to item
discrimination (see Equation 3), these three items offer relatively rich information in
assessing the individual’s latent trait. As shown in Figure 4.4, the areas under the
curve (AUCs) of the other items (i.e., SDQ6, 12, 16, and 26) appear to be limited,
while the AUC of SDQ41 is somewhere between the two sets. Hence, the items
offering the most information in the scale involve explicit expression of interest (“I am
interested in math”), liking (“I like math”), and positive affect (“I enjoy doing work in
math”), adding to the evidence that the scale is primarily measuring interest and
associated affect while including perceived competence items as cognitive indicators
of the interest. Although perceived competence or self-efficacy is theorized to play a
central role in self-concept (Bong & Skaalvik, 2003; Marsh, 1992a), findings of the
current investigation actually suggest that it is the other way around. Arguably, more
attention needs to be directed at the role of interest and affect in the formation of
academic self-concept or other similar constructs.
While the SDQI items in general show “high” to “very high” item
discrimination (Baker, 1985, 2001), the item thresholds (b) of them actually reflect the
inadequacy of the scale, particularly for younger children. It is worth mentioning that
there are three versions of the SDQ: SDQI for pre-adolescents (age 5-12; Marsh,
1992a), SDQII for early adolescents (age 13-17; Marsh, 1992b) and SDQIII for late
adolescents and adults (age 16 and over; Marsh, 1992c). That is, the development of
the SDQ has taken into account children’s developmental stages, and the SDQI was
designed especially for elementary school age children. With regard to the
mathematics domain, the wording of the SDQI items is simpler and more
Texas Tech University, Tianlan Wei, August 2014
98
straightforward (e.g., “I like mathematics”) as compared with items in SDQII (e.g., “I
look forward to mathematics classes”) and SDQIII (e.g., “I find many mathematical
problems interesting and challenging”). Moreover, respondents are given a 5-point
scale in the original SDQI (1-false to 5-true), a 6-point scale in the SDQII (1-false to
6-true), and a 8-point scale in the SDQIII (1-definitely false to 8-definitely true). The
ECLS-K adapted SDQI further reduced the number of ordered options to 4 (1-not at
all true to 4-very true), but the current investigation reveals that this adapted version
may still be age-inappropriate among elementary school children, particularly third
graders. The items tend to have very low average thresholds among third graders,
meaning that very low levels of mathematics interest is necessary for a third grader to
endorse response option k and higher rather than the lower-order options (0 through k
– 1). For instance, a randomly selected third grader has a probability of 0.5 to endorse
very true for SDQ30 (“I like math”) with a θ = -0.32, meaning that an individual does
not even need to have an above-average level of interest to respond that he likes math.
In the third-grade calibration, four of the eight items have b3 values that are below zero
(i.e., SDQ22, 30, 36, and 41), which illustrates the inadequacy of the SDQI items in
assessing third graders’ interest levels.
The adapted SDQI tends to perform better among fifth graders that the average
thresholds (i.e., b1, b2, and b3) are higher than those among third graders. Specifically,
it now takes a θ = 0.09 in fifth grade, rather than -0.32 in third grade, for a randomly
selected child to have a probability of 0.5 to endorse very true for SDQ30 (“I like
math”). Nonetheless, the b3 values of SDQ22, 30, 36, and 41 still appear to be the
lower than other items. Furthermore, none of eight items has a b3 which goes beyond
1. Despite that, the SDQI performs better among fifth graders; the scale still appears
to be inadequate in capturing much information of children who are located at the
medium-to-far right side (θ > 1) of the continuum (see Figure 4.5).
Another important finding is that the affective items (i.e., SDQ22, 30, and 36),
though offering the most information across all SDQI items, appear to have the lowest
thresholds among all items as well. That is, the three affective items may perform
Texas Tech University, Tianlan Wei, August 2014
99
well in distinguishing those who are of higher interest levels from those of lower
levels, but such high discrimination only functions within a limited range of the θ
continuum (i.e., θ < 0 for third grade; θ < 0.3 for fifth grade). Comparatively,
cognitive, perceived competency items such as SDQ6 (“Work in math is easy for me”)
are of lower discrimination but higher item thresholds. It is not counterintuitive
though that perceived competence items in general entail higher thresholds than
affective items, because even very young children (e.g., first graders) may have
developed domain-specific ability beliefs from their school experiences and peer
comparisons (Eccles et al., 1983, 1989).
Summary The first research question of this dissertation concerns the psychometric
properties of the ECLS-K adapted SDQI in measuring mathematics interest and affect
from both the CTT and IRT perspectives. It was hypothesized that the SDQI would
show sufficient psychometric properties for the sample as a whole across two
measurement occasions (i.e., third- and fifth-grade). This hypothesis is mostly
supported by the CTT results, but only partly supported by the IRT results. From the
CTT perspective, the scale is of good (α = .89) to excellent (α = .92) internal
consistency reliabilities, meaning that the items are consistent among themselves and
with the scale as a whole (Gay et al., 2008). Although SDQ26 is of relatively low
item-total correlations across waves, these correlations are still moderate to large in
magnitude (.42 and .59). Thus, the CTT analysis concludes that the SDQI
mathematics subscale possesses sufficient psychometric properties for the sample as a
whole across waves. The IRT-based evaluation, however, reveals that the SDQI needs
to be reconsidered for its age appropriateness for elementary school children. The
findings of the current investigation suggest that the SDQI performs poorly among
third graders, and tends to perform better when these children advance to fifth grade,
but still not sufficiently well because the items thresholds are lower than we expect
them to be. Future research may investigate specific reasons for this age
Texas Tech University, Tianlan Wei, August 2014
100
inappropriateness of the SDQI items, especially the cultural influence as the SDQI was
initially normed based on an Australian sample back in the 1990s (Marsh, 1992a).
Measurement Bias of the SDQI Mathematics Subscale across Gender The second research question of this dissertation concerns the measurement
bias of the SDQI items across gender, and it was hypothesized that gender differences
would emerge at item level as revealed by differential item functioning (DIF) analyses
in an IRT framework. Results of the IRT-based DIF analyses in general support this
hypothesis, but the findings also indicate some developmental changes in such
measurement bias. Specifically, all items (100%) in the scale demonstrate
measurement bias across gender in third grade but only five of them (62.5%) show
significant DIF in fifth grade.
A key finding is that the item-level measurement bias tends to be domain-
specific across waves: cognitive, perceived competence items tend to be in favor of
boys while affective, interest-related items tend to be in favor of girls. That is, a boy
is more likely to endorse higher-order options of perceived competence items than a
girl, even given the same level of mathematics interest. Conversely, a girl is more
likely to endorse higher-order options of affective items than a boy. Thus, the DIF of
the eight items appears to be balanced in the sense that half of them are in favor of one
group while half of them the other. This domain specificity of DIF is evident in fifth
grade as well, except that three of the items have become invariant across gender by
this age. The three items include SDQ16 (“I get good grades in math”), SDQ22 (“I
am interested in math”), and SDQ30 (“I like math”). In other words, a fifth-grade boy
and a fifth-grade girl tend to respond to the three items in the same manner, despite the
fact that they might have responded to them differently in third grade.
Such developmental changes also underlie the invariance in item
discrimination (a). In general, the SDQI items show higher discrimination for girls
than boys. For instance, the average item discrimination is 3.00 (SD = 1.33) for girls
and 2.75 (SD = 1.14) for boys in third grade. In fifth grade, this gap becomes
Texas Tech University, Tianlan Wei, August 2014
101
narrower that the average item discrimination has become 3.27 (SD = 1.30) for girls
and 3.15 (SD = 1.27) for boys. A relevant finding is that nonuniform DIF is observed
in more items in third grade than fifth grade. The fifth-grade DIF is mostly uniform
being a function of the DIF in b parameters alone. Taken together, it can be concluded
that the SDQI mathematics subscale tends to become more invariant across gender
over elementary school years (i.e., third- to fifth-grade).
Some of these findings may be explained by the existing literature. For
example, the domain-specific measurement bias may be accounted for by expectancy-
value theory-based studies (e.g., Eccles et al., 1983, 1989; Wigfield & Eccles, 2000).
Eccles and colleagues’ work has focused on the development of these constructs of
children and adolescents. It was found that children begin to form clearly distinct
ability-expectancy beliefs as early as first grade within the domains of mathematics,
reading, music, and sports. More importantly, their research has shown that boys’ and
girls’ beliefs and values differ in gender stereotypic ways. Such gender stereotypic
beliefs plausibly serve as the underlying mechanism for the measurement bias of the
SDQI cognitive items, especially given the domain of mathematics (Bhana, 2005). On
the other hand, it is unclear why three of the items showing bias in third grade (i.e.,
SDQ16, 22, and 30) would become invariant in the fifth grade. It seems impossible to
summarize the common characteristics of this set of items by examining the wording
of them. For instance, it is difficult to explain why fifth-grade boys and girls respond
to SDQ16 (“I get good grades in math”) in the same manner, but respond to SDQ41
(“I am good at math”) differently. Furthermore, limited by only two waves of
measurement, the current investigation is incapable of extending its inferences beyond
elementary school. For example, will measurement bias across gender keep
diminishing as children advance to middle school? Future research may extend such
investigations to adolescence or even adulthood from a longitudinal perspective.
Measurement Bias of the SDQI Mathematics Subscale across Ethnicity The third research question of this dissertation concerns the measurement bias
of the SDQI items across ethnicity, and it was hypothesized that ethnic differences
Texas Tech University, Tianlan Wei, August 2014
102
would emerge at item level as revealed by differential item functioning (DIF) analyses
in an IRT framework. Results of the IRT-based DIF analyses in general support this
hypothesis, but the findings are fairly complicated because three comparisons (i.e.,
White vs. African American, White vs. Hispanic, and White vs. Asian) were
conducted in each wave. These comparisons are not necessarily telling a singular story
about this research question.
The DIF between White and African American respondents appears to be the
most substantial among all comparisons (see Table 4.6), and the pattern of the DIF is
also the most complicated. Overall, the SDQI items do not have as high
discrimination for African American children as for White children, so the OCCs of
most items appear to be flatter for African American respondents (see Figures 4.9 and
4.12). In terms of the invariance in item thresholds, no clear pattern can be identified
across waves. In third grade, African American children are more likely to endorse
higher-order options of SDQ12 (“I cannot wait to do math each day”), SDQ26 (“I can
do very difficult problems in math”), SDQ30 (“I like math), and SDQ36 (“I enjoy
doing work in math”), whereas White children are more likely to endorse higher-order
options of SDQ16 (“I get good grades in math”). The DIF does not follow the
cognitive-affective distinction as identified in the DIF analysis across gender. To
further complicate this situation, the DIF in items become different in fifth grade:
SDQ12, 22, and 30 are in favor of the African American group, while SDQ6 and 16
are in favor of the White group. The underlying mechanism is that even the DIF
across b parameters (i.e., b1, b2, and b3) of the same item is inconsistent, meaning that
some items may have lower b1 (and b2) as well as higher (b2 and) b3 for White
respondents. In general, the items encompass narrower ranges of the θ continuum for
African American children. These findings caution the future use of the SDQI among
the African American population because, in addition to showing measuring bias
between White and African American children, the items also appear to be poorer
psychometric qualities among African Americans. For instance, the item
discrimination is as low as 0.71 for SDQ26 for African Americans in third grade,
Texas Tech University, Tianlan Wei, August 2014
103
causing this item to be quite incapable of distinguishing those with high interest levels
from those with low levels.
The DIF between White and Hispanic respondents is less substantial and less
complicated. Although all items show significant DIF between groups, the DIF
clearly follows the cognitive-affective distinction that Hispanic children are more
likely to endorse higher-order options of affective items, while White children are
more likely to endorse higher-order options of cognitive, perceived competence items.
As compared with their White counterparts, Hispanic students are more likely to
represent their interest for mathematics by endorsing direct, affective statements such
as “I like math,” and less likely to self report their interest levels through perceive
competence items. From a developmental perspective, it is also worth noting that the
DIF between groups tends to become even less substantial in fifth grade, particularly
in the DIF of item discrimination (a). The gap between the average item
discrimination between White and Hispanic groups tends to close in fifth grade, and as
a result, the DIF turns from mostly nonuniform to mostly uniform from third to fifth-
grade (see Figures 4.10 and 4.13).
The DIF between White and Asian American respondents is the least
substantial among all comparisons. In particular, SDQ22 (“I am interested in math”)
and 30 (“I like math”) are invariant across grade levels, and SDQ36 (“I enjoy doing
work in math”) also become invariant in fifth grade. Thus, the items offering the most
information appear to measure White and Asian children’s mathematics interest
equivalently. For the remaining items with measurement bias, the DIF also appears to
follow the affective-cognitive distinction. As in the White versus Hispanic
comparison, Asian American children are more likely to endorse affective items for
representing their mathematics interest. In addition, the average item discrimination is
similar to and even a bit higher for Asian American than White children across grade
levels.
Although results of these comparisons are mixed, some key findings may still
be summarized with respect to the measure bias of the SDQI items across ethnic
Texas Tech University, Tianlan Wei, August 2014
104
groups. First, most of the SDQI mathematics items (50% to 100%) show certain
levels of measurement bias according to ethnic background of the respondent. Mostly,
the DIF is shown to be of a joint influence of the invariance in item discrimination (a)
and item thresholds (b1, b2, and b3), thus being nonuniform. Second, although there
exists substantial DIF between White and any of the ethnic minorities, assuming
measurement equivalence can be most problematic for any group comparison that
involves the African American population. When the investigations involve the
Hispanic or Asian population, researchers need to bear in mind that they tend to
respond to perceived competence items and affective items differently as compared
with White people. Third, the affective-cognitive distinction seems to play an
important role in the White versus Hispanic and White versus Asian comparisons,
though the DIF of the former appears to be more substantial. It is unclear whether
stereotype threat (Ogbu, 2003; Steele & Aronson, 1995) is a contributing factor of this
phenomenon because, unlike African American and Hispanic students, Asian
American students tend to be stereotyped for their excellence in STEM fields (General
Accounting Office, 2005; Goyette & Xie, 1999; Herrara & Hurtado, 2011). Fourth,
there seems to be a trend for the DIF to diminish over time in all comparisons, which
requires further investigations from a longitudinal perspective. Last but not the least,
as indicated by the test characteristic curves of each comparison, the item-level DIF of
the opposite directions tend to cancel out that the test characteristic curves in most
cases appear to coincide (see Figures 4.8, 4.15, and 4.17). As such, the true scores
remain close enough between the two groups to be trusted for most analyses of group
differences.
Item Parameter Drift of the SDQI Mathematics Subscale The fourth research question of this dissertation concerns the measurement
bias of the SDQI items across age groups, and it was hypothesized that item
differences by age group would emerge at the item-level as revealed by differential
item functioning analyses in an IRT framework. The DIF results support this
hypothesis that all items demonstrate significant DIF across age groups. As discussed
Texas Tech University, Tianlan Wei, August 2014
105
earlier, the scale is of better psychometric qualities for assessing fifth graders’ than
third graders’ mathematics interest. The overall improvement of the scale may partly
be accounted for by the IPD findings of this study. As children transition from third-
to fifth-grade, they respond more sensitively to the option anchors (e.g., not at all true,
very true) of the SDQI items. For instance, it takes a randomly selected fifth grader a
lower θ than a third grader to have a same probability to endorse response Option 0
(very at all true), indicating that this anchor performs better in identifying children of
truly low interest levels in fifth grade. Likewise, it takes a randomly selected fifth
grader a higher θ than a third grader to have a same probability to endorse response
Option 3 (very true) of an item, indicating that this anchor performs better in fifth
grade in identifying children of relatively high interest levels. This average pattern of
drift in b parameters, however, is not large enough in magnitude to result in any
notable improvement of the SDQI in its item level or as a whole. As shown in Figures
4.16 and 4.17, most of the OCCs still intercept at some point to show nonuniform DIF
at item level, and the test characteristic curves of third- and fifth-grade data nearly
coincide with each other.
Conclusion The fifth research question of this dissertation entails a summary of the DIF
findings as a function of gender, ethnicity, and age. It was hypothesized that item-
level responses will change over time as a whole, and as a function of group
membership according to gender and ethnic group. This hypothesis is certainly
supported by the DIF findings. In the meantime, several issues emerged to note
limitations of this study, as well as to provide recommendations for future research.
In mentioning the overlap of the test characteristic curves in all group
comparisons, the multifaceted nature of academic self-concept, based on which the
SDQ was constructed, needs to be revisited. This dissertation concludes that, although
item-level DIF emerged as a function of gender, ethnicity, and age, the scale scores
can be trusted because the DIF tends to cancel out. Results of factor analysis, however,
indicate a loose one-factor structure of the scale. While multiple-factor solutions were
Texas Tech University, Tianlan Wei, August 2014
106
not tested in the current study due to the lack of empirical basis, it is possible that a 2-
factor (i.e., cognitive vs. affective) structure is valid in some subpopulations. That is,
perceived competence may have a lower correlation with academic interest in the
subpopulation that the items tend to load on a separate factor instead. If this is the
case, then DIF of the SDQI will need to be examined in the subscale level. Given a
cognitive versus affective distinction, for example, DIF analysis will need to be
conducted on the cognitive and affective items separately. Such results cannot be
predicted from the current investigation because the latent construct will be different.
Future research may consider adopting growth mixture modeling to better account for
the weak unidimensionality before any DIF analysis.
From a developmental perspective, future research may also examine the
respondent-item interaction of school age children and adolescence. Findings of the
current study seem to suggest that a 4-point scale presents still too many response
options for elementary school children, particularly younger ones. In particular, the
item thresholds appear to be so close to each other that the scale functions more like a
dichotomous item (i.e., 1 to 3-false, 4-true). This finding actually corresponds to
previous research that younger children tend to respond in an extreme manner to
Likert-type rating scales (Chambers & Craig, 1998; Goodenough et al., 1997; von
Baeyer, Carlson & Webb, 1997). Underlying this is the Piagetian theory that young
children characteristically engage in dichotomous thinking and may therefore focus on
either extreme of the response options (i.e., not at all true and very true). Interestingly,
younger children (e.g., 7- to 9-year-olds as compared with 10- to 12-year-olds) were
found to engage in such dichotomous thinking particularly when rating emotion-based
statements (Chambers & Johnston, 2002). In the same study, it was also reported that
children’s extreme scores were not a function of the number of options that children
who used the 3-point options had similar extreme scores, as did children who used the
5-point ones. As such, a dichotomous option (i.e., true or false) may be the most
appropriate and economical response format for the current SDQI items, particularly
the affective, emotion-based ones, because children are still likely to endorse extreme
Texas Tech University, Tianlan Wei, August 2014
107
scores when the scale is tailored for 3-point options. A limitation to note here is that
the aforementioned dichotomous thinking of young children does not account for the
limited and unbalanced coverage of the θ continuum of the SDQI items from the IRT
perspective. In addition, the current investigation has also observed a trend for the
thresholds to expand over the θ continuum as children grow older, but more data over
multiple occasions are needed to test this trend.
Relatedly, cultural influences must not be neglected, especially since the
original SDQI was normed in Australian samples (Marsh, 1992a). That the average
item thresholds are very low among U.S. children may also be a function of the
culture in which individuals are encouraged to express their feelings and thoughts in a
positive way. While this notion is preliminary and certainly requires further
exploration, cultural differences in how people experience, regulate, and express their
emotions are actually well-documented in the literature (e.g., Goetz, Spencer-Rodgers,
& Peng, 2008; Scollon, Diener, Oishi, & Biswas-Diener, 2004, 2005). In particular,
research has shown that different cultures socialize children to regulate their emotions
differently. For example, U.S. mothers are more likely to up-regulate children’s
positive emotions by highlighting children’s success, while Chinese mothers are more
likely to down-regulate children’s positive emotions by not highlighting their success
(Miller, Wang, Sandel, & Cho, 2002; Ng, Pomerantz, & Lam, 2007). Such findings
may help us understand why U.S. children tend to endorse the high extreme of the
options, and further, why White children are more likely to represent their academic
interest by highlighting their competence and success. To raise the average item
thresholds of the SDQI among U.S. children, items may be revised to present more
challenges and elicit more thinking of children. For instance, “I like math” may be
revised to “I like math even when I cannot work out the problems.” Of course, such
revisions may lead to other issues such as double-barrier options and low readability
for young children. Future research must carefully examine this trade-off in such
measures.
Texas Tech University, Tianlan Wei, August 2014
108
In terms of content relevance of the items, future measures of interest may also
consider including other aspects discussed in the interest literature such as curiosity,
attitude, and attention. Among these aspects of interest, educational research needs to
pay special attention to the role of attitude/value in developing and maintaining
academic interest. In general, attitude or value refers to an individual’s evaluation or
appraisal of the activity (Ajzen, 1988, 1991; Wigfield & Eccles, 2000, 2002). The
social cognitive perspective postulates that a learner’s motivation is jointly influenced
by his self-efficacy/perceived competence and his outcome expectancy /attitude/value,
and it was theorized earlier in this dissertation (see Chapter 2) that the social cognitive
perspective may well apply to the approaches to interest research. The current SDQI
items seem to address the affective and perceived competence aspects of interest well,
but largely neglect the role of value in defining interest.
According to Renninger and Hidi (2002), value plays an important role in
developing one’s unstable situational interest into personal interest, a more well-
developed type of interest which helps maintain and deepen the student’s motivation.
To be more specific, it requires certain levels of stored prior knowledge and positive
value within an individual for his situational interest to evolve into personal interest.
As such, the assessment of one’s positive value associated with the task may be a good
indicator of how well-developed and stable his academic interest is, and it is
recommended that future instruments of interest also include the measure of value. It
is worth noting that the construct of value is termed and operationalized differently in
various theories, and test developers must carefully select or design value items which
best serve the measure of interest.
A final limitation of this study is that it does not address the issues over the
distinction of personal interest and situational interest. The SDQI items were not
developed based on this theoretical classification, so the affective statements are
generally straightforward with little context information to be identified as
representing personal interest or situational interest. The IRT-based evaluation was
expected to yield results that might help distinguish situational interest from personal
Texas Tech University, Tianlan Wei, August 2014
109
interest, but it turned out that all affective items demonstrated similar psychometric
properties. In addition to the inclusion of value in defining interest, future measures of
interest may also address the personal-situational distinction of interest. Very few
studies have been conducted with regard to this distinction, but the systematic review
of situational interest by Schraw and Lehman (2001) may offer some guidelines for
future assessments of situational interest. They concluded that one could distinguish
among different types of situational interest, and organized empirical research on
interest into three main categories of situational interest as text-based, task-based, and
knowledge-based. In Mitchell’s (1993) interest survey, for example, two general areas
of interest (i.e., personal and situational) and five specific components of situational
interest (i.e., meaningfulness, involvement, computers, groups, and puzzles) are
assessed to present the multifaceted structure of interest. This survey includes items
such as “mathematics is enjoyable to me” and “I have always enjoyed studying
mathematics in school” to assess high school students’ personal interest, and items
such as “our class is fun” and “this year I like math” to assess their situational interest.
As compared with these items, the SDQI ones seem to mostly assess personal interest
due to a lack of context- or time-specificity.
In summary, findings of this study provide several directions for future
research on academic interest and affect, particularly for instrument development and
validation in the field. To develop a good self-report measure of children’s academic
interest, test developers will need to go through a list of considerations including
children’s cognitive abilities, their special interactions with emotion-based items, the
cultural scripts of the society and certain subpopulations, the wording and content
relevance of the items, and the personal-situational distinction and multifaceted
structure of interest.
Texas Tech University, Tianlan Wei, August 2014
110
BIBLIOGRAPHY
Ai, X. (2002). Gender differences in growth in mathematics achievement: Three-level longitudinal and multilevel analyses of individual, home, and school influences. Mathematical Thinking and Learning, 4(1), 1-22.
Ajzen, I. (1988). Attitudes, personality, and behavior. Milton Keynes, England: Open University Press.
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179-211.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bainbridge, W. L., & Lasley, T. J. (2002). Demographics, diversity, and K-12 accountability. The challenge of closing the achievement gap. Education and Urban Society, 34(4), 422-437.
Baker, F. B. (1985). The basics of item response theory. Portsmouth, NH: Heinemann.
Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland.
Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice-Hall.
Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall.
Bandura, A. (1997). Self-efficacy: The exercise of control. New York, NY: Freeman.
Barak, A. (1981). Vocational interests. Journal of Vocational Behavior, 19, 1-14.
Barlett, F. C. (1932). Remembering: A study in experimental and social psychology. New York, NY: Cambridge University Press.
Bhana, D. (2005). “I’m the best in maths. Boys rule, girls drool.” Masculinities, mathematics, and primary schooling. Perspectives in Education, 23, 1-10.
Blechman, E. A. (1990). Moods, affect, and emotions. Hillsdale, NJ: Lawrence Erlbaum Associates.
Texas Tech University, Tianlan Wei, August 2014
111
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113-141.
Bong, M., & Clark, R. E. (1999). Comparison between self-concept and self-efficacy in academic motivation research. Educational Psychologist, 34, 139-154.
Bong, M., & Skaalvik, E. M. (2003). Academic self-concept and self-efficacy: How different are they really? Educational Psychology Review, 15(1), 1-40.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY: Guilford.
Bussey, K., & Bandura, A. (1992). Self-regulatory mechanisms governing gender development. Child Development, 63, 1236-1250.
Bussey, K., & Bandura, A. (2004). Social cognitive theory of gender development and differentiation. In A. H. Eagly, A. E. Beall, & R. J. Sternberg (Eds.), The psychology of gender (2nd ed., pp. 92-119). New York, NY: Guilford Press.
Byrnes, J. P. (2003). Factors predictive of mathematics achievement in White, Black, and Hispanic 12th graders. Journal of Educational Psychology, 95(2), 316-326.
Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309-329.
Cai, L. (2012). flexMIRT: Flexible multilevel item factor analysis and test scoring [Computer Software]. Seattle, WA: Vector Psychometric Group, LLC.
Cai, L., Thissen, D., & du Toit, S. H. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.
Casey, M. B., Nuttall, R. L., & Pezaris, E. (1997). Mediators of gender differences in mathematics college entrance test scores: A comparison of spatial skills with internalized beliefs and anxieties. Developmental Psychology, 33, 669-680.
Catsambis, S. (1994). The path to math: Gender and racial-ethnic differences in mathematics participation from middle school to high school. Sociology of Education, 67, 199-215.
Chambers, C. T., & Craig, K. D. (1998). An intrusive impact of anchors in children’s faces pain scales. Pain, 78, 27-37.
Texas Tech University, Tianlan Wei, August 2014
112
Chambers, C. T., & Johnston, C. (2002). Developmental differences in children’s use of rating scales. Journal of Pediatric Psychology, 27(1), 27-36.
Church, A. T. (2001). Personality measurement in cross-cultural perspective. Journal of Personality, 69(6), 979-1006.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York, NY: Chapman & Hall.
de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
DeCuir-Gunby, J. T., Aultman, L. P., & Schutz, P. A. (2009). Investigating transactions among motives, emotional regulation related to testing, and test emotions. The Journal of Experimental Education, 77(4), 409-436.
DeJarnette, N. K. (2012). America's children: Providing early exposure to STEM (Science, Technology, Engineering and Math) initiatives. Education, 133(1), 77-84.
Denes-Raj, V., & Epstein, S. (1994). Conflict between intuitive and rational processing: When people behave against their better judgment. Journal of Personality and Social Psychology, 66(5), 819-829.
Dewey, J. (1913). Interest and effort in education. Boston, MA: Riverside Press.
Dweck, C. S. (1999). Self-theories: Their role in motivation. Philadelphia, PA: Taylor & Francis.
Dweck, C. S., & Elliott, E. S. (1983). Achievement motivation. In P. H. Mussen, & E. M. Heatherington (Eds.), Handbook of child psychology: Vol 4. Socialization, personality, and social development (pp. 643-691). New York, NY: Wiley.
Dweck, C. S., & Leggett, E. L. (1988). A social-cognitive approach to motivation and personality. Psychological Review, 95, 256-273.
Eccles, J. S., & Midgley, C. (1989). Stage/environment fit: Developmentally appropriate classrooms for adolescents. In R. Ames, & C. Ames (Eds.), Research on motivation in education (Vol. III, pp. 139-181). New York, NY: Academic Press.
Eccles, J. S., & Wigfield, A. (1995). In the mind of the actor: The structure of adolescents' achievement task values and expectancy-related beliefs. Personality and Social Psychology Bulletin, 21, 215-225.
Texas Tech University, Tianlan Wei, August 2014
113
Eccles, J. S., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., & Meece, J. L. (1983). Expectations, values, and academic behaviors. In J. T. Spence (Ed.), Achievement and achievement motivation (pp. 75-146). San Francisco, CA: W. H. Freeman.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In N. Eisenberg (Ed.), Social, emotional, and personality development in handbook of child psychology (Vol. III, pp. 1017-1096). New York, NY: Wiley.
Eccles, J. S., Wigfield, A., Flanagan, C. A., Miller, C., Reuman, D. A., & Yee, D. (1989). Self-concepts, domain values, and self-esteem: Relations and changes at early adolescence. Journal of Personality, 57(2), 283-310.
Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-Mental State Examination. Medical Care, 44(11), S134-S142.
Else-Quest, N. M., Hyde, J. S., & Hejmadi, A. (2008). Mother and child emotions during mathematics homework. Mathematical Thinking and Learning, 10, 5-35.
Else-Quest, N. M., Hyde, J. S., & Linn, M. C. (2010). Cross-national patterns of gender differences in mathematics: A meta-analysis. Psychological Bulletin, 136(1), 103-127.
Else-Quest, N. M., Mineo, C. C., & Higgins, A. (2013). Math and science attitudes and achievement at the intersection of gender and ethnicity. Psychology of Women Quarterly, 37(3), 293-309.
Epstein, S. (1990). Cognitive-experiential Self-theory. In L. Pervin (Ed.), Handbook of personality theory and research: Theory and research (pp. 165-192). New York, NY: Guilford Publications, Inc.
Evans, E. M., Schweingruber, H., & Stevenson, H. W. (2002). Gender differences in interest and knowledge acquisition: The United States, Taiwan, and Japan. Sex Roles, 37(3/4), 153-167.
Evans, K. M. (1971). Attitudes and interests in education (2nd ed.). London, UK: Routledge & Kegan.
Eysenck, M. E. (1982). Attention and arousal. New York, NY: Springer-Verlag.
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Thousand Oaks, CA: Sage Publications.
Texas Tech University, Tianlan Wei, August 2014
114
Flavell, J. (1987). Speculntions about the nature and development of metacognition. In F. E. Weinert, & R. H. Kluwe (Eds.), Metacognition, motivation, and understanding (pp. 21-64). Hillsdale, NJ: Erlbaum.
Fleischman, H. L., Hopstock, P. J., Pelczar, M. P., & Shelley, B. E. (2010). Highlights from PISA 2009: Performance of U.S. 15-year old students in reading, mathematics, and science literacy in an international context (NCES 2011-004). Washington, DC: U.S. Government Printing Office.
Forgas, J. (2000). The role of affect in social cognition. In J. Forgas (Ed.), Feeling and thinking (pp. 1-28). New York, NY: Cambridge University Press.
Freud, S. (1961). Some psychical consequences of the anatomical distinction between the sexes. In J. Strachey (Ed.), The standard edition of the complete psychological works of Sigmund Freud (J. Strachey, Trans., Vol. 19, pp. 241-258). London, UK: Hogarth Press.
Frijda, N. (1994). Varieties of affect: Emotions and episodes, moods, and sentiments. In P. Ekman, & R. J. Davidson (Eds.), The nature of emotion (pp. 59-67). New York, NY: Oxford University Press.
Gall, M. D., Gall, J. P., & Borg, W. R. (2006). Educational research: An introduction (8th ed.). Boston, MA: Pearson.
Gay, L. R., Mills, G. E., & Airasian, P. (2008). Educational research: Competencies for analysis and applications (9th ed.). Upper Saddle Ridge, NJ: Pearson.
General Accounting Office. (2005). Higher education: Federal science, technology, engineering, and mathematics programs and related trends (Publication No. GAO-06-114). Retrieved from U.S. Government Accountability Office website: http://www.gao.gov/products/GAO-06-702T.
Goetz, J. L., Spencer-Rodgers, J., & Peng, K. (2008). Dialectical emotions: How cultural epistemologies influence the experience and regulation of emotional complexity. In R. Sorrentino, & S. Yamaguchi (Eds.), Handbook of motivation and cognition across cultures (pp. 517-539). Amsterdam, The Netherland: Elsevier.
Goldstein, H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 33, 315-332.
Goodenough, B., Kampel, L., Champion, G. D., Laubreaux, L., Nicholas, M. K., Ziegler, J. B., & McInerney, M. (1997). An investigation of the placebo effect and age-related factors in the report of needle pain from venepuncture in children. Pain, 72, 383-391.
Texas Tech University, Tianlan Wei, August 2014
115
Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68(3), 532-560.
Goyette, K., & Xie, Y. (1999). Educational expectations of Asian American youths: Determinants and ethnic differences. Sociology of Education, 72, 22-36.
Guiso, L., Monte, F., Sapienza, P., & Zingales, L. (2008). Culture, gender, and math. Science, 320, 1164-1165.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Norwell, MA: Kluwer Academic Publishers.
Harter, S. (1981). A model of mastery motivation in children: Individual differences and developmental change. In W. A. Collins (Ed.), Aspects on the development of competence: The Minnesota symposia on child psychology (Vol. 14, pp. 215-255). Hillsdale, NJ: Erlbaum.
Harter, S. (1996). Teacher and classmate influences on scholastic motivation, self-esteem, and level of voice in adolescents. In J. Juvonen, & K. R. Wentzel (Eds.), Social motivation: Understanding children's school adjustment (pp. 11-42). Cambridge, England: Cambridge University Press.
Herbart, J. F. (1806). Allgemeine Pädagogik, aus dem Zweck der Erziehung abgeleitet. In J. F. Herbart (Ed.), Pädagogische schriften (Vol. II). Düsseldorf, Germany: Kupper.
Herrara, F. A., & Hurtado, S. (2011). Maintaining initial interests: Developing science, technology, engineering, and mathematics (STEM) career aspirations among underrepresented racial minority students. Retrieved from Higher Education Research Institute website: http://www.heri.ucla.edu/publications-main.php.
Herrera, A., & Gomez, J. (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel-Haenszel and logistic regression techniques. Quality and Quantity, 42(6), 739-755.
Hidi, S. (1990). Interest and its contribution as a mental resource for learning. Review of Educational Research, 60, 323-350.
Hidi, S. (2000). An interest researcher's perspective: The effects of extrinsic and intrinsic factors on motivation. In C. Sansone, & J. Harackiewicz (Eds.), Intrinsic and extrinsic motivation (pp. 309-339). San Diego, CA: Academic Press.
Texas Tech University, Tianlan Wei, August 2014
116
Hidi, S., & Anderson, V. (1992). Situational interest and its impact on reading and expository writing. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 215-238). Hillsdale, NJ: Erlbaum.
Hidi, S., & Harackiewicz, J. (2000). Motivating the academically unmotivated: A critical issue for the 21st century. Review of Educational Research, 70, 151-179.
Hidi, S., Baird, W., & Hildyard, A. (1982). That's important but is it interesting? Two factors in text processing. In A. Flammer, & W. Kintsch (Eds.), Discourse processing (pp. 63-75). Amsterdam, The Netherland: North-Holland.
Hoffmann, L. (2002). Promoting girls' interest and achievement in physics classes for beginners. Learning and Instruction, 12(4), 447-465.
Holland, J. L. (1985). Making vocational choices: A theory of vocational personalities and work environments (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Hossian, M. M., & Robinson, M. G. (2012). How to motivate US students to pursue STEM (Science, Technology, Engineering and Mathematics) careers. US-China Education Review A, 4, 442-451.
Houts, C. R., & Cai, L. (2013). flexMIRT(R) user's manual version 2: Flexible multilevel multidimensional item analysis and test scoring. Chapel Hill, NC: Vector Psychometric Group.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 221-233.
Hutcheson, G., & Sofroniou, N. (1999). The multivariate social scientist. London, UK: Sage.
Texas Tech University, Tianlan Wei, August 2014
117
Hyde, J. S., Fennema, E., Ryan, M., Frost, L. A., & Hopp, C. (1990). Gender comparisons of mathematics attitudes and affect. Psychology of Women Quarterly, 14, 299-324.
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A., & Williams, C. (2008). Gender similarities characterize math performance. Science, 321, 494-495.
Iran-Nejad, A. (1987). Cognitive and affective causes of interest and liking. Journal of Educational Psychology, 79(2), 120-130.
Izard, C. E. (1977). Human emotions. New York, NY: Plenum Press.
Izard, C. E., & Ackerman, B. P. (2000). Motivational, organizational, and regulatory functions of discrete emotions. In M. Lewis, & J. Haviland-Jones (Eds.), Handbook of emotions (2nd ed., pp. 253-264). New York, NY: Guilford Press.
Jacobs, J. E., & Simpkins, S. D. (2005). Mapping the leaks in the math, science, and technology pipeline. New Directions for Child and Adolescent, 110, 3-6.
Kahle, J., Parker, L., Rennie, L., & Riley, D. (1993). Gender differences in science education: Building a model. Educational Psychology, 28, 379-404.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141-151.
Kao, G., & Thompson, J. (2003). Racial and ethnic stratification in educational achievement and attainment. Annual Review of Sociology, 29, 417-442.
Kaplan, A., & Midgley, C. (1999). The relationship between perceptions of the classroom goal structure and early adolescents’ affect in school: The mediating role of coping strategies. Learning and Individual Differences, 11, 187-212.
Kelley, H. H., & Michela, J. (1980). Attribution theory and research. Annual Review of Psychology, 31, 457-501.
Kim, J., & Oshima, T. C. (2013). Effect of multiple testing adjustment in differential item functioning detection. Educational and Psychological Measurement, 73, 458-470.
Klassen, R. M., & Usher, E. L. (2010). Self-efficacy in educational settings: Recent research and emerging directions. In T. C. Urdan, & S. A. Karabenick (Eds.), Advances in motivation and achievement: Vol. 16A. The decade ahead: Theoretical perspectives on motivation and achievement (pp. 1-33). Bingley, UK: Emerald Publishing Group.
Texas Tech University, Tianlan Wei, August 2014
118
Kline, P. (2000). The handbook of psychological testing (2nd ed.). New York, NY: Routledge.
Köller, O., Baumert, J., & Schnabel, K. (2001). Does interest matter? The relationship between academic interest and achievement in mathematics. Journal for Research in Mathematics Education, 32(5), 448-470.
Korpershoek, H., Kuyper, H., Bosker, R., & van der Werf, G. (2013). Students leaving the STEM pipeline: An investigation of their attitudes and the influence of significant others on their study choice. Research Papers in Education, 28(4), 483-505.
Krapp, A. (1999). Interest, motivation, and learning: An educational-psychological perspective. Learning and Instruction, 14(1), 23-40.
Krapp, A., Hidi, S., & Renninger, K. A. (1992). Interest, learning, and development. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 3-25). Hillsdale, NJ: Erlbaum.
Lacey, T. A., & Wright, B. (2009). Occupational employment projections to 2018. Monthly Labor Review, 132(11), 82-123.
Langer, M. (2008). A reexamination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation). Retrieved from https://cdr.lib.unc.edu/
Larsen, R. J., & Fredrickson, B. L. (1999). Measurement issues in emotion research. In D. Kahneman, E. Diener, & N. Schwarz (Eds.), Well-being: Foundations of hedonic psychology (pp. 40-60). New York, NY: Russell Sage.
Leahey, E., & Guo, G. (2001). Gender differences in mathematical trajectories. Social Forces, 80(2), 713-732.
Lee, V. E., & Burkam, D. T. (2003). Inequality at the starting gate: Social background differences in achievement as children begin school. Washington, DC: Economic Policy Institute.
Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children's intrinsic interest with extrinsic reward: A test of the "overjustification" hypothesis. Journal of Personality and Social Psychology, 28, 129-137.
Lipsitz, S., & Fitzmaurice, G. (2009). Generalized estimating equations for longitudinal data analysis. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp. 43-78). Boca Raton, FL: Chapman and Hall/CRC.
Texas Tech University, Tianlan Wei, August 2014
119
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice Hall.
Lord, F. M. (1977). A study of item bias, using item characteristic curve theory. In Y. H. Poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Amsterdam, The Netherland: Swets and Zeitlinger.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Marsh, H. W. (1992a). Self Description Questionnaire (SDQ) I: A theoretical and empirical basis for the measurement of multiple dimensions of preadolescent self-concept. An interim test manual and research monograph. Macarthur, New South Wales, Australia: University of Western Sydney, Faculty of Education.
Marsh, H. W. (1992b). Self Description Questionnaire (SDQ) II: A theoretical and empirical basis for the measurement of multiple dimensions of adolescent self-concept. A test manual and research monograph. Macarthur, New South Wales, Australia: University of Western Sydney, Faculty of Education.
Marsh, H. W. (1992c). Self Description Questionnaire (SDQ) III: A theoretical and empirical basis for the measurement of multiple dimensions of late adolescent self-concept. An interim test manual and research monograph. Macarthur, New South Wales, Australia: University of Western Sydney, Faculty of Education.
Marsh, H. W., & O'Neill, R. (1984). Self Description Questionnaire III: The construct validity of multidimensional self-concept ratings by late adolescents. Journal of Educational Measurement, 21(2), 153-174.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's (1999) findings. Structural Equation Modeling, 11, 320-341.
Marsh, H. W., Relich, J. D., & Smith, I. D. (1983). Self-concept: The construct validity of interpretations based upon the SDQ. Journal of Personality and Social Psychology, 45, 173-187.
Marsh, H. W., Smith, I. D., & Barnes, J. (1983). Multitrait-multimethod analysis of interpretations based upon the SDQ: Student-teacher agreement on multidimensional ratings of student self-concept. American Educational Research Journal, 20, 333-357.
Texas Tech University, Tianlan Wei, August 2014
120
Marsh, H. W., Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2005). Academic self-concept interest grades and standardized test scores: Reciprocal effects models of causal ordering. Child Development, 76, 397-416.
Martinez, S., & Guzman, S. (2013). Gender and racial/ethnic differences in self-reported levels of engagement in high school math and science courses. Hispanic Journal of Behavioral Sciences, 35(3), 407-427.
McGraw, R., Lubienski, S. T., & Strutchens, M. E. (2006). A closer look at gender in NAEP mathematics achievement and affect data: Intersections with achievement, race/ethnicity, and socioeconomic status. Journal of Research in Mathematics Education, 37, 129-150.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118.
Miller, P., Wang, S., Sandel, T., & Cho, G. (2002). Self-esteem as folk theory: A comparison of European American and Taiwanese mothers’ beliefs. Parenting: Science and Practice, 2, 209-239.
Mitchell, M. (1993). Situational interest: Its multifaceted structure in the secondary school mathematics classroom. Journal of Educational Psychology, 85(3), 424-436.
Muthén, L. K., & Muthén, B. O. (2010). Mplus user's guide (6th ed.). Los Angeles, CA: Muthén & Muthén.
National Academy of Sciences, National Academy of Engineering, & Institute of Medicine. (2007). Rising above the gathering storm: Energizing and employing America for a brighter economic future. Washington, DC: The National Academies Press.
National Research Council. (2011). Successful K-12 STEM education: Identifying effective approaches in science, technology, engineering, and mathematics. Washington, DC: The National Academies Press.
National Science Board. (2010). Science and engineering indicators. Arlington, VA: National Science Foundation.
National Science Foundation, National Center for Science and Engineering Statistics. (2010). Survey of doctorate recipients. Retrieved from http://www.nsf.gov/statistics/wmpd/2013/pdf/tab9-22.pdf
National Science Foundation, National Center for Science and Engineering Statistics. (2013). Women, minorities, and persons with disabilities in science and
Texas Tech University, Tianlan Wei, August 2014
121
engineering: 2013 (Special Report NSF 13-304). Retrieved from http://www.nsf.gov/statistics/wmpd/2013/pdf/nsf13304_digest.pdf
Ng, F., Pomerantz, E., & Lam, S. (2007). European American and Chinese parents’ responses to children’s success and failure: Implications for children’s responses. Developmental Psychology, 43, 1239-1255.
Oakes, J. (1990). Opportunities, achievement, and choice: Women and minority students in science and mathematics. Review of Research in Education, 16, 153-222.
Ogbu, J. (2003). Black American students in an affluent suburb: A study of academic disengagement. Mahwah, NJ: Erlbaum.
Organization for Economic Cooperation and Development. (2003). Education at a glance. Paris, France: OECD.
Pekrun, R. (1992). The impact of emotions on learning and achievement: Toward a theory of cognitive/motivational mediators. Applied Psychology: An International Review, 41(4), 359-376.
Pekrun, R., Elliot, A. J., & Maier, M. A. (2006). Achievement goals and discrete achievement emotions: A theoretical model and prospective test. Journal of Educational Psychology, 98(3), 583-597.
Pekrun, R., Elliot, A. J., & Maier, M. A. (2009). Achievement goals and achievement emotions: Testing a model of their joint relations with academic performance. Journal of Educational Psychology, 101(1), 115-135.
Pekrun, R., Goetz, T., Titz, W., & Perry, R. P. (2002). Academic emotions in students’ self-regulated learning and achievement: A program of qualitative and quantitative research. Educational Psychologist, 37(2), 91-105.
Piaget, J. (1985). The equilibration of cognitive structures. Chicago, IL: University of Chicago Press.
Potenza, M., & Dorans, N. J. (1995). DIF assessment of polytomously scored items: A framework for classification and evaluation. Applied Psychological Measures, 19(1), 23-37.
President's Council of Advisor on Science and Technology. (2010). Prepare and inspire: K-12 education in STEM (science, technology, engineering and math) for America’s future. Retrieved from http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-stemed-report.pdf
Texas Tech University, Tianlan Wei, August 2014
122
Provasnik, S., Kastberg, D., Ferraro, D., Lemanski, N., Roey, S., & Jenkins, F. (2012). Highlights from TIMSS 2011: Mathematics and science achievement of U.S. fourth- and eighth-grade students in an international context (NCES 2013-009). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
Raju, N. S. (1988). The area between two item characteristics curves. Psychometrika, 53, 495-502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.
Ramirez, M., Teresi, J. A., Holmes, D., Gurland, B., & Lantigua, R. (2006). Differential item functioning (DIF) and teh Mini-Mental State Examination (MMSE): Overview, sample, and issues of translation. Medical Care, 44(Suppl 3), S95-S106.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Renninger, K. A. (1984). Object-child relations: Implications for both learning and teaching. Children's Environment Quarterly, 1, 3-6.
Renninger, K. A. (1989). Individual patterns in children's play interests. In L. T. Winegar (Ed.), Social interaction and the development of children's understanding (pp. 147-172). Norwood, NJ: Ablex.
Renninger, K. A. (1990). Children's play interests, representation, and activity. In R. Fivush, & J. Hudson (Eds.), Knowing and remembering in young children (pp. 127-165). Cambridge, England: Cambridge University Press.
Renninger, K. A. (1992). Individual interest and development: Implications for theory and practice. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 361-395). Hillsdale, NJ: Erlbaum.
Renninger, K. A., & Hidi, S. (2002). Student interest and achievement: Developmental issues raised by a case study. In A. Wigfield, & J. S. Eccles (Eds.), Development of achievement motivation (pp. 173-195). New York, NY: Academic Press.
Riegle-Crumb, C. (2005). The cross-national context of the gender gap in math and science. In L. Hedges, & B. Schneider (Eds.), The social organization of schooling (pp. 227-243). New York, NY: Russell Sage Foundation.
Texas Tech University, Tianlan Wei, August 2014
123
Riegle-Crumb, C., & Grodsky, E. (2010). Racial-ethnic differences at the intersection of math course-taking and achievement. Sociology of Education, 83(3), 248-270.
Riegle-Crumb, C., Moore, C., & Ramos-Wada, A. (2011). Who wants to have a career in science or math? Exploring adolescents’ future aspirations by gender and race/ethnicity. Science Education, 95(3), 458-476.
Rivers, C., & Barnett, R. C. (2011). The truth about boys and girls: Challenging toxic stereotypes about our children. New York, NY: Columbia University Press.
Rogosa, D. R. (1995). Myth and methods: “Myths about longitudinal research” plus supplemental questions. In J. M. Gottman (Ed.), The analysis of change (pp. 3-65). Hillsdale, NJ: Erlbaum.
Russell, S. H., Hancock, M. P., & McCullough, J. (2007). Benefits of undergraduate research experiences. Science, 316, 548-549.
Samejima, F. (1969). Calibration of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 19, 86-100.
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111-121.
Sanders, T. (2004, October). No time to waste: The vital role of college and university leaders in improving science and mathematics education. Paper presented at the Invitational Conference on Teacher Preparation and Institutions of Higher Education, Washington, DC.
Scheirer, M. A., & Kraut, R. E. (1979). Increasing educational achievement via self concept. Review of Educational Research, 49(1), 131-150.
Schiefele, U. (1991). Interest, learning, and motivation. Educational Psychologist, 26, 299-323.
Schiefele, U., Krapp, A., & Winteler, A. (1992). Interest as a predictor of academic achievement: A meta-analysis of research. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 183-212). Hillsdale, NJ: Erlbaum.
Schiefele, U., Winteler, A., & Krapp, A. (1988). Studieninteresse und fachbezogene Wissensstruktur. Psychologie in Erziehung und Unterricht, 35, 106-118.
Schmidt, W. H. (2011, May). STEM reform: Which way to go? Paper presented at the National Research Council Workshop on Successful STEM Education in K-12
Texas Tech University, Tianlan Wei, August 2014
124
Schools. Retrieved from http://sites.nationalacademies.org/dbasse/bose/dbasse_080128#.UgEMEFPkDDn
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321.
Schraw, G., & Lehman, S. (2001). Situational interest: A review of the literature and directions for future research. Educational Psychology Review, 13(1), 23-52.
Schunk, D. H. (1995). Self-efficacy and education and instruction. In J. E. Maddux (Ed.), Self-efficacy, adaptation, and adjustment (pp. 281-303). New York, NY: Plenum Press.
Schunk, D. H., Pintrich, P. R., & Meece, J. L. (2008). Motivation in education (3rd ed.). Columbus, OH: Merrill.
Scollon, C. N., Diener, E., Oishi, S., & Biswas-Diener, R. (2004). Emotions across cultures and methods. Journal of Cross-Cultural Psychology, 35, 304-326.
Scollon, C. N., Diener, E., Oishi, S., & Biswas-Diener, R. (2005). An experience sampling and cross-cultural investigation of the relation between pleasant and unpleasant emotion. Cognition and Emotion, 19, 27-52.
Seifert, T. L. (1995). Academic goals and emotions: A test of two models. The Journal of Psychology, 129(5), 543-552.
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Self-concept: Validation and construct interpretations. Review of Educational Research, 46, 407-441.
Shavelson, R., & Bolus, R. (1981). Self-concept: The interplay of theory and methods. Journal of Educational Psychology, 74(1), 3-17.
Silvia, P. J. (2005). What is interesting? Exploring the appraisal structure of interest. Emotion, 5(1), 89-102.
Silvia, P. J. (2008). Interest—The curious emotion. Current Directions in Psychological Science, 17(1), 57-60.
Simpkins, S. D., & Davis-Kean, P. E. (2005). The intersection between self-concepts and values: Links between beliefs and choices in high school. New Directions for Child and Adolescent Development, 110, 31-47.
Texas Tech University, Tianlan Wei, August 2014
125
Skaalvik, E. M. (1997). Issues in research on self-concept. In M. Maehr, & P. R. Pintrich (Eds.), Advances in motivation and achievement (Vol. 10, pp. 51-97). New York, NY: JAI Press.
Skaalvik, E. M., & Rankin, R. J. (1996). Studies of academic self-concept using a Norwegian modification of the SDQ. Paper Presented at the XXVI International Congress of Psychology, Montreal, Canada.
Skaalvik, S., & Skaalvik, E. M. (2004). Gender differences in math and verbal self-concept, performance expectations, and motivation. Sex Roles, 50(3/4), 241-252.
Skinner, B. F. (1953). Science and human behavior. New York, NY: Free Press.
Sörbom, D. (1989). Model modification. Psychomertika, 54, 371-384.
Spelke, E. S. (2005). Sex differences in intrinsic aptitude for mathematics and science. American Psychologist, 60(9), 950-958.
Steele, C., & Aronson, J. (1995). Stereotype threat and the intellectual performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797-811.
Super, D. E. (1990). A life-span, life-space approach to career development. In D. Brown, L. Brooks, & Associates (Eds.), Career choice and development (pp. 197-261). San Francisco, CA: Jossey-Bass.
Swaminathan, H., & Rogers, J. A. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Tanzer, N. K. (1996). Interest and competence as components of academic self-concepts. Paper Presented at the XXVI International Congress of Psychology, Montreal, Canada.
Teresi, J. A. (2006). Overview of quantitative measurement methods: Equivalence, invariance, and differential item functioning in health applications. Medical Care, 44(11), S39-S49.
Thissen, D. J., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Erlbaum.
Thissen, D. J., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In H. Wainer, & H.
Texas Tech University, Tianlan Wei, August 2014
126
I. Braun (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Erlbaum.
Thorndike, E. L. (1935). The fundamentals of learning. New York, NY: Teachers College Press.
Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., & Najarian, M. (2009). Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K), combined user's manual for the ECLS-K eighth-grade and K-8 full sample data files and electronic codebooks (NCES 2009-004). Washington, DC: National Center for Education Statistics, Institute of Education Sciences.
Tracey, T. J. (2002). Development of interests and competency beliefs: A 1-year longitudinal study of fifth- to eighth-grade students using the ICA-R and structural equation modeling. Journal of Counseling Psychology, 49, 148-163.
Tracey, T. J., & Ward, C. C. (1998). The structure of children's interests and competence perceptions. Journal of Counseling Psychology, 45, 290-303.
Tsui, M. (2007). Gender and mathematics achievement in China and the United States. Gender Issues, 24, 1-11.
U.S. Department of Education, National Center for Education Statistics. (2010). Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) kindergarten through fifth grade Approaches to Learning and Self-Description Questionnaire (SDQ) items and public-use data files (NCES 2010-070) [Data file]. Washington, DC: Author.
Valsiner, J. (1992). Interest: A metatheoretical perspective. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 27-41). Hillsdale, NJ: Erlbaum.
van Langen, A., & Dekkers, H. (2005). Cross-national differences in participating in tertiary science, technology, engineering, and mathematics education. Comparative Education, 41(3), 329-350.
Vandenberg, R. J. (2002). Towards a further understanding of an improvement in measurement invariance methods and procedures. Organizational Research Methods, 5(2), 139-158.
von Baeyer, C., Carlson, G., & Webb, L. (1997). Underprediction of pain in children undergoing ear piercing. Behavioural Research and Therapy, 35, 399-404.
Vygotsky, L. S. (1978). Mind and society: The development of higher psychological processes. Cambridge, MA: Harvard University Press.
Texas Tech University, Tianlan Wei, August 2014
127
Watt, H. M., Shapka, J. D., Morris, Z. A., Durik, A. M., Keating, D. P., & Eccles, J. S. (2012). Gendered motivational processes affecting high school mathematics participation, educational aspirations, and career plans: A comparison of samples from Australia, Canada, and the United States. Developmental Psychology, 48(6), 1594-1611.
Weiner, B. (1986). An attributional theory of motivation and emotion. New York, NY: Springer-Verlag.
Weiner, B. (1992). Human motivation: Metaphors, theories, and research. Newburry Park, CA: Sage.
Weiner, B. (1994). Integrating social and personal theories of achievement striving. Review of Educational Research, 64(4), 557-573.
Weisberg, H. F. (2005). The total survey error approach: A guide to the new science of survey research. Chicago, IL: University of Chicago Press.
Wigfield, A. (1994). Expectancy-value theory of achievement motivation: A developmental perspective. Educational Psychology Review, 6(1), 49-78.
Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical analysis. Developmental Review, 12, 265-310.
Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68-81.
Wigfield, A., & Eccles, J. S. (2002). The development of achievement motivation. San Diego, CA: Academic Press.
Wigfield, A., Eccles, J. S., Schiefele, U., Roeser, R. W., & Davis-Kean, P. (2006). The development of achievement motivation. In N. Eisenberg (Ed.), Handbook of child psychology (6th ed., Vol. III). New York, NY: Wiley.
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33(1), 42-57.
Woods, C. M., Cai, L., & Wang, M. (2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532-547.
Xie, Y., & Shauman, K. A. (2003). Women in science: Career processes and outcomes. Cambridge, MA: Harvard University Press.
Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. (Doctoral dissertation, University of
Texas Tech University, Tianlan Wei, August 2014
128
California, Los Angeles). Retrieved from http://statmodel2.com/download/Yudissertation.pdf
Zimmerman, B. J., & Schunk, D. H. (2003). Albert Bandura: The scholar and his contributions to educational psychology. In B. J. Zimmerman, & D. H. Schunk (Eds.), Educational psychology: A century of contributions (pp. 431-457). Mahwah, NJ: Lawrence Erlbaum Associates.