View
2
Download
0
Category
Preview:
Citation preview
NOMINATION AND IDENTIFICATION OF TRADITIONALLY UNDERREPRESENTED
STUDENTS FOR GIFTED PROGRAMS: INSIGHTS FROM A POPULATION DATASET
by
MATTHEW T. MCBEE
(Under the Direction of Thomas P. Hébert)
ABSTRACT
A set of studies were performed using a large population dataset obtained from the
Georgia Department of Education. The studies focused on uncovering the causes of the
underrepresentation of Black and Hispanic students as well as students from low socioeconomic
status backgrounds in gifted and talented education programs. The first study examined the
performance of the referral sources used in Georgia and concluded that automatic and teacher
referrals have the best performance. The study also uncovered evidence that the majority of
underrepresentation takes place at the nomination stage of the gifted assessment process. The
second study quantified the impact of various individual- and school-level variables on the
probability that a student will be identified. It found evidence that race and socioeconomic status
make large, independent contributions to the probability of identification. The final study used a
sample dataset to introduce hierarchical linear modeling and multilevel structural equation
modeling.
INDEX WORDS: Gifted, talented, Georgia, identification, underrepresentation, assessment,
minority, Black, Hispanic, Asian, African-American, socioeconomic, context, nomination, teacher, ability, multilevel, structural equation modeling
NOMINATION AND IDENTIFICATION OF TRADITIONALLY UNDERREPRESENTED
STUDENTS FOR GIFTED PROGRAMS: INSIGHTS FROM A POPULATION DATASET
by
MATTHEW T. MCBEE
B.S., Tennessee Technological University, 2002
M.Ed., University of Georgia, 2004
A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2006
© 2006
Matthew T. McBee
All Rights Reserved
NOMINATION AND IDENTIFICATION OF TRADITIONALLY UNDERREPRESENTED
STUDENTS FOR GIFTED PROGRAMS: INSIGHTS FROM A POPULATION DATASET
by
MATTHEW T. MCBEE
Major Professor: Thomas P. Hébert
Committee: Deborah Bandalos Martha Carr Bonnie Cramond
Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia May 2006
iv
DEDICATION
This work is dedicated to my parents, Dennis and Mary Kay McBee, whose lifetimes of
hard work made it possible for me to pursue my educational and professional goals.
v
ACKNOWLEDGEMENTS
Thanks to my soon-to-be wife, Kristin Pierce, for the generous and sometimes firm
support she gave me during the process of writing this dissertation. She also helped with some
rather intractable page numbering issues in the word processing software. Thanks to my major
professor, mentor, and lifelong friend, Dr. Thomas Hébert, who not only contributed a great deal
to the quality of this work through his editing but also provided me with encouragement when it
was sorely needed. Both Dr. Deborah Bandalos and Dr. Linda Muthén were extremely helpful
and prompt with their advice when I had questions related to statistics or the use of the MPlus
program. Scott Fowler provided editing support to me during the final stages of the writing
process.
vi
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS........................................................................................................v
LIST OF TABLES................................................................................................................... viii
LIST OF FIGURES....................................................................................................................x
CHAPTER
1 Introduction and Literature Review............................................................................1
2 A Descriptive Analysis of Referral Sources for Gifted Identification Screening by
Race and Socioeconomic Status..........................................................................48
3 Examining the Probability of Identification of Students for Gifted Programs in
Georgia Elementary Schools: A Multilevel Structural Equation Modeling Study.71
4 Multilevel Analysis in Gifted Education................................................................120
5 Summary and Future Directions ............................................................................167
APPENDICES........................................................................................................................157
A MPlus code for regression analysis........................................................................171
B MPlus code for regression accounting for clustering..............................................172
C MPlus code for random intercept hierarchical linear model....................................173
D MPlus code for random slope and intercept hierarchical linear model ....................174
E MPlus code for single-level structural equation model ...........................................175
F MPlus code for SEM accounting for clustering ......................................................176
G MPlus code for ML-SEM with random intercept model.........................................177
vii
H MPlus code for ML-SEM with random slope and intercept models........................178
viii
LIST OF TABLES
Page
Table 2.1: Identified elementary students by race and SES........................................................57
Table 2.2: Overall comparison of referral sources.....................................................................58
Table 2.3: Comparison of referral sources by SES.....................................................................59
Table 2.4: Comparison of referral sources by race.....................................................................61
Table 2.5: Comparison of referral sources by race and SES.......................................................62
Table 3.1: Variable descriptions................................................................................................82
Table 3.2: Descriptive statistics.................................................................................................83
Table 3.3: Variable intercorrelations.........................................................................................84
Table 3.4: Model fit information for each analysis step.............................................................92
Table 3.5: Model summary for within-schools component of random slope model ....................99
Table 3.6: Model summary for between-schools component of random slope model ...............103
Table 3.7: Model summary for “Lunch” slope portion of random slope model ........................107
Table 3.8: Model summary for “Asian” slope portion of random slope model .........................109
Table 3.9: Model-implied probabilities of identification..........................................................111
Table 4.1: Variable descriptions for sample dataset .................................................................140
Table 4.2: Regression and HLM model results........................................................................142
Table 4.3: SEM and ML-SEM model results...........................................................................148
ix
LIST OF FIGURES
Page
Figure 1.1: Relationship between student numerical intelligence and sensitivity to school mean
numerical intelligence per SES-group .......................................................................32
Figure 3.1: Within-schools path model......................................................................................88
Figure 3.2: Between-schools structural model ...........................................................................90
Figure 3.3: Random slope model for both “ lunch” and “race (Asian)” to “probability of being
identified as gifted” ...................................................................................................93
Figure 3.4: Path values and standard errors for within-schools portion of random slope model..98
Figure 3.5: Path values for between-schools (intercept) component of random slope model ....102
Figure 3.6: Path values for slope portion of random slope model 1A (from “lunch” to
“probability of being identified gifted”)...................................................................106
Figure 3.7: Path values for slope portion of random slope model 1B (from “Asian” to
“probability of being identified gifted”)...................................................................108
Figure 4.1: Coefficients in the random intercept model ...........................................................133
Figure 4.2: Coefficients in the random slope model.................................................................136
Figure 4.3: Example structural equation model .......................................................................138
Figure 4.4: Regression model specified in a SEM context .......................................................138
Figure 4.5: Results for single-level SEM .................................................................................147
Figure 4.6: Results for ML-SEM random intercept model .......................................................151
Figure 4.7: Results for ML-SEM random slope and intercept models......................................153
1
CHAPTER 1
INTRODUCTION AND LITERATURE REVIEW
2
The numerical underrepresentation of African-American, Hispanic, and Native students
in gifted education programs has been frequently cited in the gifted education literature (Ford,
1998; Reid, Romanoff, Algozzine, & Udall, 2000; Sarouphim, 1999; Scott, Perou, Urbano,
Hogan, & et al., 1992). Though many publications have addressed this problem, relatively few
have attempted to quantify the severity of the issue. Ford (1998) and Brown (1997) both cited
information gathered by the Office of Civil Rights indicating that Black and Hispanic students
were underrepresented by 41% and 42%, respectively. Most publications in gifted education that
address the issue simply begin by stating that underrepresentation has been and continues to be a
problem. Many proposed explanations for the underrepresentation issue have been cited in the
literature and will be examined later in this chapter.
Factors that Reduce Students’ Probability of Identification for Gifted Programs
Disadvantaged Ethnic Group Membership
The most critical issue related to understanding the disparity in gifted program enrollment
across racial groups is: to what extent does this disparity represent actual differences of
developed or potential capability across groups, and to what extent does it indicate the presence
of some serious flaws in the methods by which we screen, identify, and serve gifted students?
Most scholars who have examined the issue have agreed with Frasier’s belief that “There is no
logical reason to expect that the number of minority students in gifted programs would not be
proportional to their representation in the general population” (1997, p. 498). Though some
early psychologists studying human intelligence believed that non-White populations had lower
IQs by virtue of inferior genetics (e.g., “Spearman’s hypothesis” as described by Naglieri and
Jensen, 1987), this belief has been widely and rightly dismissed as a racist and shameful legacy.
For these reasons, most scholars within gifted education have assumed that the practices and
3
procedures commonly utilized by schools to find and serve students from underrepresented
groups have failed, and that there are a great number of undiscovered gifted children out there.
As Torrance (1977) observed, “there is a great deal of giftedness among the culturally different
and the waste or underuse of those resources is tragic” (p. 3).
Ford (1998) pointed out that there are at least three classes of contributing factors to the
underrepresentation issue: recruitment and identification problems, personnel training problems,
and retention problems. Currently, the first of these has received the most attention in the
literature.
Most gifted programs rely at least to some degree on standardized measures of ability or
achievement during the assessment process. It has been widely noted that minority and
economically disadvantaged youth significantly under perform on these tests relative to their
relatively advantaged peers (e.g., Ford, Harris, Tyson, & Trotman, 2002; Entwisle & Alexander,
1992; Maker, 1996; Naglieri & Jensen, 1987). The performance gap between Black and White
students tends to be about one standard deviation. In Mills and Tissot’s (1995) study of two
identification instruments, they found that mean scores on the School and College Ability Test
(SCAT), the more traditional of the two instruments studies, were 21.03 for Black students and
28.42 for White students on the verbal subscale, with a pooled standard deviation of about 8.4
points. About the same gap was observed for the math subscale scores, where Black students
had a mean of 18.9 compared with 25.54 for White students, with a pooled standard deviation of
about eight points. More shocking still is that the scores of students receiving free or reduced
price lunch were 9.85 for verbal and 11.29 math, which barely exceeded the scores of students in
special education. Again, these results are fairly typical of similar studies.
4
If the population test score gap between Black and White students is about one standard
deviation, and the most traditional cutoff score of two standard deviations above the mean on a
measure of mental ability is used to determine gifted program placement, it can easily be shown
from examining the normal curve that the cutoff score would allow about 3% of White children
to qualify (at two standard deviations above their group mean) while the same cutoff would
allow only .13% of Black children to qualify, because they would need to score about three
standard deviations above their group mean to meet the same criteria. This statistical
examination is extremely oversimplified, however, it does illustrate just how severely group
mean differences can affect the proportion of members found in the tails. Ford (1998) rightly
pointed out that many school districts continue to use such outdated definitions of giftedness, and
that the common cutoffs on mental tests are arbitrary.
A great deal of literature has examined this test score differential. Many critics have
argued that minority students tend to do poorly on such tests because the tests themselves are
flawed by being biased in favor of students from the dominant culture (Ford et al., 2002). In
other words, standardized tests might unfairly penalize minority students by assigning them
lower scores for the same level of underlying ability or achievement. The exact nature of this
bias is unknown. Ford argued that verbally loaded tests tend to penalize minority students.
There is some support for this in the literature. For example, Mills and Tissot’s (1995)
previously-mentioned study compared the scores of 347 students from a wide variety of ethnic
backgrounds on the School and College Ability Test (SCAT) to their scores on Raven’s
Advanced Progressive Matrices (APM). The performance gap between verbal and math
subscale scores on the SCAT for Black students was about the same as the gap for White
students. Hispanic students performed better relative to the White group on the math items than
5
on the verbal items. However, because of the brief description of the math items provided by the
author, one cannot assume that they were not also verbally loaded. When the students were
compared on the APM, the magnitude of the score discrepancy was significantly reduced to
about half a standard deviation, as opposed to a full standard deviation on the SCAT. On the
basis of this and other studies (e.g., Shaunessy, Karnes, & Cobb, 2004), many scholars in gifted
education have advocated for using nonverbal instruments such as Raven’s Advanced
Progressive Matrices and the Naglieri Non-Verbal Assessment test for use with minority, low
SES, or ESL students. Other scholars, such as Pyryt (1996), have cautioned against abandoning
traditional IQ tests too quickly, pointing out that detailed statistical analysis has not revealed
evidence that specific items are biased against minority students. He noted that culturally-loaded
or biased items should be missed more frequently by students from backgrounds that would not
allow them access to this cultural knowledge. When examining the test results of students from
various backgrounds, this pattern is not observed. Minority students and White students tend to
do poorly on the same items; only the former incorrectly answers those items more often.
Another contributing factor to the test score differential could be that minority students
face psychological, social, or cultural barriers not experienced by students from the majority
culture. The literature on this subject is limited, and most of it has focused on issues affecting
Black children. Fordham and Ogbu (1986) argued that Black students in particular experience
intense social pressure to underachieve and disengage from school because they understand
education to be the domain of the White middle class. Therefore, to strive for school
achievement is to strive for entrance into this culture and thus betray the African American
culture. Evidence supporting this claim comes primarily from qualitative case studies. Fordham
(1988) went on to categorize students who achieve academically at the price of perceived
6
betrayal of their cultures to have experienced a Pyrrhric victory. Valenzuela’s (1999)
ethnographic study of a predominantly Hispanic high school in Texas found that a similar
process existed for second-generation Hispanic immigrants. Another promising area of
investigation deals with stereotype threat (Steele, 1997), which is the finding that performance
on cognitive tasks is significantly depressed whenever individuals fear that their poor
performance might reinforce a negative stereotype about a group to which they belong.
Stereotype threat was originally envisioned to apply to Black-White differences in test
performance. Subsequent research has shown that it operates in female math performance
(Schmader, Johns, & Barquissau, 2004), memory performance tasks in older adults (Chasteen,
Bhattacharyya, Horhota, Tam, & Hasher, 2005), and social cue decoding in men (Koenig &
Eagly, 2005). Yopyk and Prentice (2005) found that student athletes performed more poorly on
a math test when primed with their athlete identity than when primed with their student identity.
Though this finding has been widely replicated, it is poorly understood (see a recent
review by Smith, 2004). For example, Marx and Goff (2005) found that experimenter race has a
salient effect in studies of stereotype threat involving race. Black experimenters were unable to
replicate the performance detriment due to stereotype threat that White experimenters created.
Student-Level Socioeconomic Status (SES)
Students from low socioeconomic status backgrounds are another group that has been
widely described as being underrepresented in gifted education programs. Socioeconomic status
itself is a broad concept whose definition is subject to some debate. The literature examining the
effects of SES on educational outcomes can usually be classified into two major categories with
respect to the operationalization of SES. The first set of studies operationalizes SES at the
individual level as a dichotomous variable indicating whether or not a student is eligible for the
7
federal free lunch program, or at the school level as the percentage of the student body eligible
for such assistance (e.g., Brosnan, 1983; Entwisle & Alexander, 1992; Ryan & French, 1976). In
this case, eligibility for free or reduced-price lunch is a proxy for annual family income. The
second set of studies operationalizes SES at the individual level as a composite variable usually
including family income, parental educational attainment, and occupational status (e.g., Portes &
MacLeod, 1996; Rumberger, 1995). A third category of studies make use of a single indicator
variable that is not related to the free lunch program, such as occupational prestige alone (Quay,
1989). These have generally been based on Hollingshead and Redlich’s (1958) occupational
scale.
Literature on the representation of low-SES students in gifted education is less
voluminous than research on race, probably because the dearth of poor children in gifted
programs in not nearly as visible as the lack of Black and Hispanic children. There is no federal
office similar to the Office of Civil Rights for the poor, and issues of class do not have the same
urgency and history as issues of race in this country. Previous studies of race and gifted program
admittance are seriously flawed because race and SES are very highly related. Studies that have
examined race without controlling for SES either statistically or experimentally are confounded
and thus very difficult to interpret.
Descriptive data are even less accessible on the numbers describing the degree to which
low SES students are underrepresented in gifted programs. There are, however, innumerable
studies examining the impact of SES on school achievement. The impact of SES on school
achievement has been shown to be very powerful in almost every study including it as a
predictor (Steinberg, Blinde, & Chan, 1984). To examine this issue, the following discussion has
been organized around common variables chosen as outcomes in the literature.
8
School readiness. Mills (1983) examined school readiness as a function of parental
socioeconomic status in a sample of 49 predominantly middle class kindergartners. School
readiness was assessed via the Test of Basic Experiences, General Concepts, Level K. The study
examined the explanatory power of three measures of SES, which included annual family
income, mother’s level of education, and father’s level of education. Results indicated that the
father’s level of education was significantly related to school readiness, explaining 10.2% of the
variance. The other two measures of SES were not significantly related to school readiness.
This is a somewhat surprising finding and may be due to range restriction on the SES variable in
the sample. Nonetheless, the results are clear. Parental SES may have some impact on school
readiness. Other studies, such as Entwisle and Alexander (1992), Garibaldi (1997), and West
(1985) have also confirmed a small but significant reduction in school readiness for low SES
students as compared to other students – a gap that grows larger during each successive year of
schooling.
Cognitive development. Numerous studies have confirmed that low SES students do not
perform as well as other students on tests of mental ability. Ryan and French (1976) performed a
study of the impact of SES on measured intelligence and school achievement in 209 elementary
school students selected from schools serving homogeneously low, middle, and high SES
populations. Their results provided evidence that low SES students lag behind their more
advantaged agemates in both verbal and nonverbal IQ. Specifically, the mean verbal IQ for the
low SES group was 95.6, compared to 104.8 for the middle SES group and 109.6 in the high SES
group, as measured by the Lorge-Thorndike Intelligence Tests. The mean nonverbal IQ for the
low SES group was 94.8, compared to 108.2 and 113.9 for the middle and high SES groups,
9
respectively. It is important to note that the gap between the low and middle SES groups was
roughly twice the size as the gap between the middle and high groups.
Perhaps more interesting is Quay’s (1989) study of SES in Piagetian task performance.
This study utilized a sample of 144 first, second, and third grade children. These children were
then classified as being from low, middle, or high SES backgrounds based on parental
occupation. The study hypothesized that one cause of the observed difference in Piagetian task
performance between SES groups (Overton, Wagner, & Dolinsky, 1971) is that low SES
students experienced less congruence between their home and school environments (Laosa,
1983) and therefore would have less skill with school-like materials and tasks. Therefore,
stimulus material was included as an independent variable. In this case, the stimulus material
could be either cardboard cutouts (school-like) or food. The three Piagetian tasks examined were
classification, conservation of substance, and conservation of number.
In the classification task, the performance of the third grade low SES group was inferior
to the performance of the first grade high SES group, indicating the presence of a significant
developmental delay in the low SES group. The gap between the low and middle SES groups
was much higher than the gap between the middle and high SES groups. As predicted, low SES
children of all ages performed better on tasks involving food as a stimulus material. Similar
patterns were found for the classification of substance task. A major ceiling effect for the high
SES group was found in the classification of numbers task. The results indicated that experience
with school materials may explain a fraction of the performance differential across SES groups
and reinforced previous findings that low SES children may experience slower cognitive
development.
10
Domain specific achievement tests. A number of studies of SES have achievement test
scores as an outcome. One obvious advantage of this approach is that it creates a common scale
of measurement across teachers and schools. Among these studies are Entwisle and Alexander
(1992), Mills and Tissot (1995), Portes and MacLeod (1996), Ryan and French (1976), and
Tyler-Wood and Carri (1993). Entwisle and Alexander’s (1992) study examined the scores of
Black and White students on the first, second, and third grade versions of the math section of the
California Achievement Test. They found that by third grade, the mean math achievement score
for the low SES group was about one half of a standard deviation lover than that of the rest of the
sample. Furthermore, even though the racial differences were the focus of the study, the authors
concluded that racial effects were minimal when SES was controlled.
Portes and MacLeod (1996) defined achievement as reading and math scores for an
Stanford Achievement Test in their study of advantaged and disadvantaged ethnic communities
in California and Florida. They found that parental SES had a strong effect on achievement. A
one standard deviation change in parental SES would result in a 10 percentile gain in math and
an 11 percentile gain in reading, controlling for the other factors.
Ryan and French’s (1976) study found that low, middle, and high SES had Iowa Test of
Basic Skills (ITBS) composite raw scores of 28.4, 34.8, and 40.4, respectively. This corresponds
to a .85 standard deviation difference between the low and middle SES groups and a 1.51
standard deviation difference between the low and high SES groups.
Returning to Mills and Tissot’s (1995) study, the performances of the group of students
receiving free or reduced price lunch on the SCAT were three and two standard deviations below
the scores of the White students in verbal and math achievement respectively. As mentioned
previously, the scores of the group receiving free or reduced-price lunch (FRL) barely exceeded
11
those of students enrolled in special education. The gap was reduced to a little more than half a
standard deviation when comparing scores on Ravens’ APM. This pattern was also noted by
Tyler-Wood and Carri (1993), who compared the scores of low and average SES groups of
students nominated for gifted programs on a variety of measures, including the CogAT, the
OLSAT, the Stanford-Binet 4, the Slosson Intelligence Test – Revised, and the Matrix Analogies
Test. They found that the gap between SES groups was much bigger on verbal tasks, such as
verbal section of the CogAT and the verbal section of the Stanford-Binet as compared to the
Matrix Analogies Test.
The results of these studies and others have established that low SES is consistent with
lowered performance on standardized achievement measures. Unfortunately, the results are not
reported in such a way to allow the calculation of effect sizes across studies, so the size of the
effect cannot be directly computed.
Global measures of academic performance. The use of teacher-assigned grades as
outcomes is less common in the literature on SES and school achievement. This is probably due
to the variation in grading practices, procedures, and standards across teachers and classes that
greatly reduces the reliability of grades as a measure. Ryan and French’s (1976) study examined
GPAs for third, fourth, and fifth grade in addition to achievement test and IQ scores. The
students attending low SES schools had consistently lower GPAs than the students attending
middle or high SES schools. However, the reported standard deviations for the mean GPAs are
quite high in comparison with the mean differences across schools, so it is probable that some of
the comparisons would not have differed significantly if significance testing had been performed.
12
School Socioeconomic Status
A number of studies have examined the impact of the SES composition of schools on
student achievement (Everson & Millsap, 2004; Griffith, 1996; Kennedy, 1992; Maggi,
Hertzman, Kohen, & D'Angiulli, 2004; Opdenakker & Van Damme, 2001; Raudenbush & Bryk,
1986; Taylor & Harris, 2003, West, 1985). Most of these studies relied on multilevel analysis
schemes. Taylor and Harris’s (2003) study of race segregation, Griffith’s (1996) study of
parental empowerment, Maggi et. al.’s (2004) study of neighborhood SES composition on high
achieving children, and West’s (1985) study of school-level factors in reading and math
achievement relied on ordinary regression analyses conducted at the school level. Though the
studies have been conducted in the United States, Canada, and Belgium, and have used various
operational definitions of SES and achievement, the results have been remarkably consistent
across studies. Most studies have found that students from all backgrounds do better in schools
with high SES student bodies.
Everson and Millsap (2004) fitted a set of multilevel structural equation models with
latent means to a large data set examining the impact of individual and school level predictors of
SAT performance. The results indicated that school SES had very powerful effects on both SAT
math and SAT verbal scores. Not only did SES exert large direct effects on SAT scores, it also
exerted strong indirect effects through school achievement (grades) and extracurricular activities.
A one standard deviation change in SES would be expected to increase SAT math scores by 60
points and SAT verbal scores by 54.6 points directly. The change in SES would cause
extracurricular activities to increase by .88 standard deviations and would also increase school
achievement by .46 standard deviations. These changes in achievement and extracurricular
13
activities would then contribute an additional 22.4 points to SAT math and an additional 31.6
points to SAT verbal.
Griffith (1996) examined the impact of parental involvement on academic performance.
His sample included 41 elementary schools, and all variables were measured at the school level.
Academic achievement was operationalized as aggregate scores on the criterion-referenced test
(CRT) while parent involvement was operationalized as the score on a 30-item survey of parental
participation in school activities. The data were analyzed via a flat regression model, with
school racial composition and SES entered at step one as covariates. Interestingly, neither race
nor SES had a significant relationship with academic performance. These results are not typical
for studies of this type, and may reflect problems with biased parameter estimates and low power
resulting from the data aggregation.
Kennedy (1992) analyzed the performances of Black and White male third graders on a
shortened form of the Educational Development Series (EDS) tests used in Louisiana.
Achievement was operationalized as a composite of reading, mathematics, and language
sections. Separate hierarchical linear models (HLMs) were fitted to each group of students.
Results indicated that the school SES was the strongest predictor of achievement at the school
level for both the Black and White children. However, the effect was approximately twice as
strong for White students. The authors concluded that all students are affected by the
composition of their classrooms, but White students appeared to be most sensitive to this
composition
A similar study was conducted by Taylor and Harris in 2003. They examined the effects
of relative integration and segregation on Black and White students’ Stanford 9 scores in third,
fifth, and eigth grades via simple bivariate correlations. The percentage of students within
14
schools receiving free or reduced price lunch served as a proxy for SES. The achievement of
Black students in eighth grade was negatively correlated with the percentage of the enrollment
that is Black (r = -.688) and the percentage of the enrollment receiving free or reduced-price
lunch (r = -.821), while it was positively correlated with the percentage of students that are White
(r = .698). White students’ achievement was not significantly affected by the Black enrollment
or the overall percentage of students receiving free or reduced-price lunch. It was, however,
negatively correlated with the percentage of White students receiving free or reduced-price
lunch. Taken together, this study and the previous study seem only to create confusion regarding
the relative impact of student SES background on the achievement of Black and White students.
However, both of them clearly indicated that SES of the student body was related to school
achievement.
West (1985) examined achievement at the school level, defining achievement as the
number of students within a school at grade level above in reading and math on the New Jersey
Minimum Basic Skills test. The use of the stepwise method for determining the order of entry
for her predictors resulted in SES being entered first in her analysis of math achievement and
second (behind percentage of the school population that is Black) in the reading achievement
analysis. This complicates the interpretation in the reading achievement case due to collinearity
between race and SES. Her results indicated that each percentage point of increase in the low
SES school population resulted in a .35 percent reduction in the number of students within that
school who were on grade level or above for math.
Setefania Maggi and colleagues (2004) studied the effect of organization-level SES on
the proportion of high achieving students within school. This study also used separate flat
regression models for predicting reading and math achievement, measured in fourth grade and
15
seventh grade. Neighborhood SES was entered in the first step of all four analyses. It was
highly significant in all of them, with R squared values ranging from .358 to .442. Moreover, the
magnitude of the predictor increased from fourth to seventh grade in all of the analyses. The
authors reiterated Hertzman, McLean, Kohen, Dunn, and Evans’ (2002) findings that a teacher in
a classroom serving thirty low SES students can expect to encounter ten with developmental
delays and still more with specific learning disabilities, whereas a teacher serving thirty students
in a high SES neighborhood can expect only three or four students to have similar issues. Maggi
et. al. proposed that it is this uneven distribution of learning difficulties across SES groups that
might explain their findings. As they stated,
“The learning experiences of the highly competent children may be compromised by the
less stimulating academic climate created by a high proportion of children who face
learning difficulties and by the lack of attention from a teacher who is focused on
children who require additional support” (p. 110).
Opdenakker and Van Damme’s (2001) study is perhaps the most interesting,
comprehensive, and methodologically sound of all the research reported here. The study was
conducted in Belgium, so its generalization to American schooling may not be warranted. They
studied the impact of school SES composition and math ability on individual students’ math
achievement using a series of three level hierarchical linear models. Their study is situated in the
context of school effectiveness research, which has generally concluded that schools do have the
power to affect the learning of their students. The authors pointed out that student composition,
particularly with respect to ability, has been ignored in the previous work on school effectiveness
and may considerably “muddy the waters.” The simple bivariate correlation between the father’s
educational attainment and math achievement was r = .66. In their model of school achievement
16
with student background characteristics entered, father’s educational attainment (the SES proxy)
was highly significant, though slightly less powerful than student’s numerical intelligence. The
standardized beta value of .338 indicated that math achievement could be expected to go up by
about a third of a standard deviation for each standard deviation increase in father’s education,
even after controlling for the student’s mathematical ability. Further results from this study will
be discussed later in this chapter.
Portes and MacLeod’s (1996) study of the factors affecting the academic performances of
students in four immigrant communities provided evidence reinforcing the findings of other
studies regarding individual and aggregate SES. They extended the efforts of previous work by
also testing for the effect of a cross-level SES interaction. Portes and MacLeod used average
school SES as a predictor of the slope coefficient relating individual SES to mathematics
achievement. This predictor was positive and significant such that the slope relating individual
SES to math achievement was steeper in high SES schools than the same relationship in a low
SES school. Students from high SES backgrounds had higher math achievement when they were
situated within high SES schools. Students from low SES backgrounds were doubly penalized
by attending high SES schools and performed better when they attended low SES schools.
Based on the findings from the reviewed studies, there is extensive evidence supporting
the strong role of the SES in school achievement, at the individual and aggregate levels as well
as an interaction between these across levels. These results provide a strong rationale for
considering these three effects of SES in future examinations of academic performance.
Proposed Mechanisms
Reminded of Rumberger’s (1995) call to focus attention on the processes by which SES
affects educational outcomes over the traditional examination of SES as a structural variable, we
17
turn now to an examination of proposed mechanisms through which race and poverty may
hamper achievement.
Home and School Environments
Peer influences. Due to such factors as “White flight” and other types of racial and class
stratification, families tend to live in areas populated by other families with similar
characteristics. We have already examined a number of studies showing that poverty has a
strong association with poor educational outcomes, both at the individual and aggregate levels.
Furthermore, schools tend to be relatively homogeneous with respect to their socioeconomic
makeup due to the aforementioned stratification. Students surrounded by peers who are
performing poorly in school are more likely to accept this condition as normative and perform
poorly themselves (Bennett, 1995). A possible explanation for this might lie in an extension of
social comparison theory (Festinger, 1954), which argues that when objective information is
lacking or untrustworthy, people judge their performances and abilities against those of their
peers. Marsh and Parker (1984) referred to this as the frame of reference model. This process
could cause children who outperform many of their classmates to conclude that they are doing
“well enough”, when in fact their performance does not compare favorably with students in more
advantaged schools.
Motivation and academic disidentification as cultural phenomena. As discussed in the
introduction, race and SES are confounded variables. Therefore, a great deal of the research
examining the impact of SES on achievement does so from the perspective of examining the
performances of White students versus those of Black or Hispanic students. Fordham and Ogbu
(1986) proposed that Black students might resist achieving high levels of school success due to
negative social sanctioning from their peers—by being accused of “acting White.” Fordham
18
(1988) argued that Black students must adopt a raceless identity in order to be successful in
schools. She went on to characterize this as a meaningless victory where the rewards may not be
worth the sacrifice. Most Black students are unwilling to sacrifice their racial and cultural
identities in order to achieve in schools, and more importantly, they should not have to. Many
Black students may perceive school achievement and being Black as incompatible identity
structures. A similar scenario may occur among Hispanic students. Valenzuela’s (1999)
ethnography of a primarily Hispanic high school provided evidence that first-generation
Hispanic immigrants exhibited higher levels of school achievement than Hispanic students born
in the United States, in spite of the increased language difficulties that often accompany recent
immigration. She argued that those students born in the United States had assimilated beliefs
that Hispanic students cannot do well in school without denying their home culture.
Summer setback. Entwisle and Alexander (1992) examined math achievement in a
sample of elementary school students. As described earlier in this review, they found that high
and low SES students entered first grade with only a small achievement differential. This gap
grew to about half a standard deviation by the time the students reached third grade. To better
understand the nature of this development, data were examined from multiple testing dates. The
findings were quite striking. Both groups of students made progress during the school year. In
fact, low SES children made larger gains than high SES children during the school year.
However, during the summers, low SES children lost ground while high SES children continued
to gain. The cumulative effect of these “summer setbacks” was responsible for the growing gap
between the SES groups. The authors concluded that schools may be doing a better job with
educating poor children than they are typically credited for, and that the culprit may lie in the
home environments of economically disadvantaged children.
19
Maternal communication. Low and high SES families tend to utilize different child
rearing strategies. Middle- and upper-class mothers spend more time talking to their children,
help children understand the causes and consequences of events, and provide more active
scaffolding for their children’s attempts at problem-solving. Research examining the
communication between mothers and children in problem solving situations has shown that low
SES mothers tend to use brief and highly directive statements when assisting their children in
problem solving (Hess & McDevitt, 1984). High SES mothers, on the other hand, tend to take a
less directive approach, issuing leading questions that help children solve the problems on their
own. This style of teaching rather than telling has been shown to result in higher achievement
test scores.
Lack of resources. Low-SES mothers may lack access to quality prenatal health care and
nutrition. The risk of premature delivery is higher for poor mothers, which increases the risk for
the child to have cognitive deficits or learning disabilities. Low-SES mothers are more likely to
use drugs (both legal and illegal) during pregnancy (McLoyd, 1998). Poor families also tend to
experience higher levels of emotional, physical, and financial stress, leading to more conflicts
between parents and children (Duncan & Brooks-Gunn, 2000). The educational attainment of
parents in poor homes is obviously much lower than in middle- and upper-class homes, the
homes contain fewer books and other educational materials, and less disposable income is
available for learning aids like computers, library trips, and museum visitation.
Psychological Issues
Self-concept. Though it is reasonable to assume that low SES children would have low
global self-concepts, two classic studies have suggested that low SES children may actually have
higher global self-concepts (Soares & Soares, 1969; Trowbridge, 1972). Soares and Soares
20
(1969) compared the self-perceptions of 229 low SES and 285 high SES students in fourth and
eighth grade. The low SES students reported more positive self-perceptions than the high SES
students. Trowbridge’s (1972) study compared the scores of 3,798 children from low, middle,
and high SES backgrounds on the Coopersmith self-esteem inventory and found that low SES
children outscored middle SES children on general self-concept, school-academic self concept,
and social self concept. However, Wylie (1979) pointed out that these classic studies confused
racial group membership with SES (a common flaw in SES research), and that the research up to
that point had failed to yield replicable findings. More recent research has focused on the
academic self-concept rather than global self-concept. Marsh (1984) employed a path analytic
methodology and found that general and academic self-concept were only modestly correlated (r
= .20), and that academic ability had a positive effect on academic self-concept, while the
average ability of one’s classmates had a negative effect on academic self-concept. The author
concluded that his results support the frame-of-reference theory, which argues that individuals
judge their abilities relative to those of their peers instead of against some absolute criteria.
However, high SES students had higher academic self-concepts than low SES students
regardless of the academic context. Marsh, Relich, & Smith (1983) and Shavelson and Bolus
(1982) found evidence that academic self-concept was moderately correlated with measured
academic achievement.
Relative impact of race and SES. One of the questions that has barely been addressed in
the gifted education literature is the relative importance of race and SES in gifted program
identification. Portes and MacLeod (1996) mentioned that race dropped out of their models
when SES was controlled. To address this question, McBee (in press) performed a study using
publicly available school-level data for almost 15,000 schools in Georgia, Louisiana, Texas,
21
Arkansas, and Florida. The study began by examining the bivariate correlations between the
proportion of students within schools that were White, Black, and Hispanic with the proportion
receiving free or reduced-price lunch and the proportion identified gifted. In agreement with
previous research, negative correlations (approximate r = -.23) were found between the number
of Black students within schools and the number of gifted students across the three states with
sizable Black populations. More sizable negative correlations were found between the number
of students receiving free or reduced-price lunch and the number of gifted students within
schools (approximate r = -.30). However, when the relationship between the percentages of
Black or Hispanic students and the number of gifted students was examined with the proportion
of students receiving free or reduced-price lunch (FRL) was partialed out, all the correlations
dropped to nonsignificance. This is especially impressive given the huge sample size and ample
power of the study. When the opposite relationship was examined, (i.e., the relationship between
the proportion of students receiving FRL and the number of gifted students with the racial
composition of the schools controlled) the correlations remained significant. This analysis
provided evidence that race is not the underlying cause for underrepresentation. However, these
results measured at the group level cannot be assumed to apply to the individual level due to the
ecological fallacy (Robinson, 1950).
Realistic Expectations for Gifted Program Representation
Perhaps the largest question confronting scholars who study the underrepresentation issue
is the following: Just what is meant by proper representation? Most scholars who have examined
this issue have insisted, overtly or explicitly, that gifted programs will not be just until their
enrollments mirror those of the larger society. Whether or not this is a reasonable belief has
scarcely been addressed in the literature, though it is a topic that needs attention. Indeed, many
22
scholars who insist that only equal representation is fair are self-contradictory to some extent.
These scholars have rejected previous notions regarding the genetic heritability of intelligence,
which would cast the underrepresentation issue as a reflection of the inferior genetic
endowments of certain racial groups, and have taken the position that giftedness is a
developmental quality – and rightly so. However, to argue that only equal representation is fair
from this position is to ignore voluminous amounts of research describing the impact of
environmental stimulation on development, such as Quay’s (1989) study of the role of SES in
Piagetian conservation tasks, McLoyd’s (1998) work on the effects of economic depravation, and
Entwisle and Alexander’s (1992) study of the “summer setback” phenomenon reflecting the
effects of non-stimulating home environments. It is not racist or classist to believe that though
children from all racial, cultural, and socioeconomic backgrounds have the potential to be gifted,
children from these groups grow up in very different environments which go on to drive or
hamper the development and expression of this trait we call giftedness.
Proposed Methods to Identify Traditionally Underrepresented Students
In recent years, a number of scholars and researchers in gifted education have attempted
to find methods of identifying gifted students that would lessen or eliminate the
underrepresentation of students from minority or low-SES backgrounds. These methods have
included the creation of more culturally inclusive descriptions of the traits of giftedness (Frasier
& Passow, 1994), non-verbal psychometric assessment of ability (Naglieri & Ford, 2003; Tyler-
Wood & Carri, 1993), dynamic assessment (Van-Tassel Baska et al., 2002), creativity testing
(Torrance, 1977), and alternative assessments based on Gardner’s theory of multiple
intelligences (Sarouphim, 1999).
23
Traits, Attitudes, and Behaviors. In most school districts, students must be nominated for
screening before they are officially evaluated for gifted program qualification. The most
common source of these nominations is the classroom teacher (Gagné, 1994, Siegle, 2001), and
most teachers in the United States are from White, middle-class backgrounds. A common
concern expressed in the literature is that teachers may not recognize the signs of giftedness
when they are expressed by a student from a culture with which the teacher is not familiar.
Frasier and Passow (1994) addressed this concern by compiling a list of ten traits, attitudes, and
behaviors (TABs) that are universal indicators of giftedness, which could be used by teachers to
improve the quality of their nominations. The authors note that the expression of these TABs
will vary across environmental and cultural backgrounds, and that gifted students rarely express
all of these to the same degree. The traits are 1) motivation, which may be manifested as unusual
persistence, 2) intense interests that are advanced and consuming, 3) advanced communication
skills that may manifest themselves verbally, physically, artistically, or symbolically, 4) high
problem-solving ability which may be reflected in the spontaneous creation of effective and
creative strategies, 5) extensive memory indicated by a large knowledge base and rapid
acquisition of new information, 6) persistent inquiry and curiosity, 7) insight into deeper
meanings, evidenced by the ability to integrate knowledge across disciplines, 8) reasoning, the
ability to think logically and critically, 9) creativity, evidenced by the production of many new
and original ideas, and 10) a keen sense of humor that may be expressed gently or aggressively.
Of these ten characteristics, problem-solving ability, memory, and reasoning are qualities that are
assessed on modern tests of cognitive ability such as the WAIS-III and the Stanford-Binet 5.
Intense interests, insight, humor, motivation, and communication skills are qualities that, if
noticed by knowledgeable adults, are theoretically likely to result in a nomination for further
24
screening. If one assumes that these tests are relatively unbiased against poor students, one must
conclude that at least an impoverished background depresses those three abilities because, as
discussed previously in this review, students from low-SES backgrounds consistently perform
more poorly on these tests than their more advantaged peers.
Nonverbal tasks. Tyler-Wood and Carri (1993) compared the scores of low and average
SES groups of students nominated for gifted programs on a variety of measures, including the
CogAT, the OLSAT, the Stanford-Binet 4, the Slosson Intelligence Test – Revised, and the
Matrix Analogies Test. They found that the gap between SES groups was much bigger on verbal
tasks, such as verbal section of the CogAT and the verbal section of the Stanford-Binet. Verbal
tasks are usually considered to reflect crystallized abilities, which are largely determined by an
individual’s prior knowledge and experiences (VanTassel-Baska, Johnson, & Avery, 2002). This
evidence led Tyler-Wood and Carri to call verbal tasks the low-SES gifted student’s “albatross,”
and caused scholars in the field to seek out identification instruments that are primarily non-
verbal. The Naglieri Non-Verbal Ability Test (NNAT) was designed for this purpose, and some
evidence suggests that it identifies similar proportions of White, Black, and Hispanic students at
the upper end of the performance range (Naglieri & Ford, 2003). However, Lohman (2005)
criticized Naglieri and Ford’s work, arguing that their results were based on a highly
nonrepresentative sample of Black and Hispanic children who were from relatively affluent
backgrounds.
Dynamic assessment. Dynamic assessment is a new form of measuring learning ability
that is less dependent on prior knowledge and experiences than traditional forms of testing
(Babaeva, 1999; Bolig & Day, 1993; VanTassel-Baska et al., 2002). These tests are designed to
measure learning speed through a test-train-test format. Examiners present examinees with a
25
pretest, give examinees targeted instruction based on their performance on the pretest, then
administer a posttest (Kirschenbaum, 1998). Previous research on dynamic assessment has
demonstrated that traditionally measured intelligence is strongly correlated with learning speed
(Ferretti & Butterfield, 1992), and that dynamic assessments more accurately predicted later
school success than the WISC-R for ESOL students (Luther & Wyatt, 1989). Lidz and Macrine
(2001) found that dynamic assessments were effective at identifying culturally diverse learners.
Dynamic assessment appears to be a promising method for identifying students from
underrepresented groups and is worthy of future study.
Creativity. Creativity may be the reliably measurable ability that is not depressed in low
SES populations. Cicerelli (1966) found that the only significant difference between the
creativity scores of low SES and high SES students was that the low SES students had higher
scores in nonverbal elaboration. Similarly, Rogers (1968) found that low SES children were
better in figural fluency. He also noted that the low SES children were more spontaneous and
less conforming, traits which are often thought to be indicative of a creative style. Kaltsounis
(1974) found that low SES Black students outperformed middle class White students on fluency
and originality on the 1966 version of the TTCT figural. However, these findings are not
universal. Forman (1979) found that high-SES children scored higher on a measure of creativity
than their low-SES counterparts. However, when ability and achievement test differences were
partialed out, these differences became non-significant. Haley (1984) compared samples of
middle- and working-class Black children and found that the middle-class students were more
creative in verbal fluency while the working-class students were more creative in kinetic fluency.
Torrance (1977) compiled a list of creative strengths that he believed to be exhibited by
Black students at advanced levels based on his previous research. These included such skills as
26
the ability to express emotions freely, the ability to improvise with materials, expressive and
dramatic speech, responsiveness to the concrete, enjoyment of movement, rich imagery
language, originality, social facility in small groups, and problem-centeredness. Torrance argued
that searching for evidence of the creative strengths in Black students would help educators
discover and identify them as gifted.
Assessments based on MI theory. A number of researchers are currently investigating
assessments based on Gardner’s (1993) theory of multiple intelligences. Perhaps the most
thoroughly developed and researched assessment scheme of this type is the DISCOVER project
(Maker, Nielson, & Rogers, 1994; Sarouphim, 2002). DISCOVER, which stands for
Discovering Individual Strengths and Capabilities through Observation while Allowing for
Varied Ethnic Responses, is grounded in Gardner’s (1993) theory of multiple intelligences and is
based on notions of “ intelligence-fair” testing and authentic assessment as alternatives to
traditional pen-and-paper tests. The DISCOVER assessment asks students to participate in five
tasks, designed to require linguistic, logical / mathematical, spatial, and interpersonal
intelligences. A panel of judges observes the students in these activities and rates the
performance in each domain on a four-point scale. In general, students receiving the maximum
score in two or more categories are considered to be gifted.
Sarouphim (2002) studied the DISCOVER protocol as applied to a sample of Hispanic
and Native American high school students, most of whom were from low-SES backgrounds.
The average correlation between scores on the five activities was quite small (r = .278). She
concluded that 29.3% of the sample would be classified as gifted according to this approach.
This finding is consistent with previous research on DISCOVER, which identified 22.9% of
students in the sample as gifted (Sarouphim, 2001). Though this and other non-traditional
27
approaches are clearly able to identify low-SES and minority students for inclusion in gifted
programs, they are not without problems. First, the identification of almost 30% of students in
the sample as gifted is extremely liberal. It is possible that traditional instruments would yield
similar results with lowered cutoffs. Second, the low correlations between tasks indicate that
these students would require highly individualized programming because they could not be
assumed to have common strengths or skills. Third, the study begs the question of differential
identification across SES levels because it did not investigate the DISCOVER model with a
high-SES sample. Based on identification patterns from more traditional instruments, it is seems
likely that most or all students from advantaged backgrounds might be identified as gifted via
this approach.
School Correlates that May Increase the Probability of Identification
Counseling availability. Grantham and Ford (2003) laid out a theoretical rationale
describing how the availability of counseling services within schools could be related to the
identification and retention of minority students in gifted programs. They argued that
multicultural counseling specifically addressing issues of race could help support minority
students through the interpersonal challenges that can discourage their participation. One of
these issues is outright racism as described in Harmon’s (2002) case study of a group of inner-
city gifted youth who were bussed to a predominantly White school for gifted services. The
students reported numerous instances of racist insults and remarks from other students in the
school as well as inferior treatment from teachers. At the conclusion of the article, none of the
students were willing to participate in more gifted education if it meant being bussed to another
school. This outcome is quite disturbing for advocates of gifted education.
28
A major issue that multicultural counseling may address is Fordham and Ogbu’s (1986)
contention that when intelligent African American students enroll in advanced academic
programs and put obvious effort into academic achievement, they may receive negative social
sanctioning from their peers, who may accuse them of abandoning their own culture and “acting
White.” Students are then placed in an uncomfortable situation in which they must choose social
acceptance or academic effort. Students choosing to achieve may be viewed as cultural traitors
(Fordham, 1988). Similar conflicts have been reported for students of the majority culture, such
as Gross’s (1989) description of the “forced choice dilemma” that may force gifted students to
choose between actualizing their abilities or experiencing intimacy and solidarity with their
peers. However, “acting White” appears to carry with it an accusation of deeper and more
profound betrayal and thus is probably more potent. Cordeiro’s (1991) ethnography of at-risk
Hispanic students attending inner city high schools discussed the importance of positive role
models and significant others that assisted in the development and maintenance of an achieving
identity. This finding was repeated in Hébert and Reis’s (1999) ethnography of high achieving
students in an inner-city high school.
Grantham and Ford (2003) proposed a series of ways that counseling personnel can
support gifted Black students. They advocated for the creation of mentoring programs for Black
students in which successful mentors can serve as discussion partners on such issues as social
injustice, motivation, and persistence. Counselors can give talks on anger management which
may be especially useful for students in the stages of racial identity development that are
accompanied by a great deal of anger. Counselors could provide conflict resolution training to
students to help them cope with the social negotiation aspect of being Black and gifted. Another
29
significant role for counselors would be to provide support via prescribed fictional readings, a
strategy discussed extensively by Ford and Harris (1999).
Given such difficulties, it is perhaps easy to see how counseling might support students in
negotiating a middle way between such extremes. Indeed, there is some indirect evidence to
support this link. Wilson (1986) reviewed and synthesized the literature on the effects of
counseling interventions on underachievers. For the purpose of her review, only studies
conceptualizing underachievement as a disparity between earned grades and achievement test
scores were considered. The review concluded that counseling was sometimes effective in
bringing about long-term change in behavior. Counseling interventions that were most effective
were long term (defined as lasting six months to one year), group-based, and directive in nature.
The case for considering the availability of counseling services within schools in any
model of gifted identification is tentative. Existing literature provides a rationale for how this
availability might be advantageous to underrepresented populations, but empirical studies in this
area are needed.
Orderly learning environment. Few research studies have examined the association
between school characteristics and achievement while controlling for the composition variables
that frequently confound these studies. One example of such a study is Opdenakker and Van
Damme’s (2001) study of mathematics achievement, which was previously introduced. These
researchers examined the effect of school composition and school process variables on
mathematics achievement. One such process variable included in their study was what they
called “orderliness,” a composite of classroom management effectiveness and time spent on
learning. The orderliness variable was moderately correlated with math achievement (r = .48).
In the multilevel analysis, orderliness itself was not entered due to high multicollinearity with
30
composition variables. However, an interaction term of orderliness and student numerical ability
was created and entered into the model. This interaction was found to have a significant positive
effect on the regression slope of numerical ability on math achievement. All the slopes were
positive, but the slopes became more positive in schools with high orderliness. This indicated
that students were able to actualize their math abilities more effectively in schools with high
orderliness.
Instructional quality. West (1985) examined the effects of composition and process
variables on the educational effectiveness of urban schools. She analyzed school level data using
a standard regression model, entering the compositional variables (SES and race variables) first
to serve as covariates, then adding in school process variables. Separate analyses were
conducted for reading and math achievement. Teaching experience, a school level variable
operationalized as a composite of years of teaching experience per teacher, percentage of the
faculty with tenure, and amount of professional preparation, was entered into the model as a
predictor. The results indicated that teacher experience was significantly (p = .012) and
positively associated with reading achievement, even after controlling for student background
characteristics and other school process variables. Considering that the study had very low
power due to its small sample size of only 26 schools, the effect of teacher experience would
have to be very powerful in order to trigger statistical significance.
Miller, Ellsworth, and Howell (1986) studied schools that deviate from the traditional
SES – achievement relationship. Unfortunately, this study used a very unusual methodology
rendering the results rather untrustworthy. The authors began with a set of 73 Kansas elementary
schools. They sorted these into two rank ordered lists: one for the percentage of students within
schools receiving free or reduced price lunch and another for low school achievement, defined as
31
the proportion of students within the school below grade level on the comprehension subtest of
the Iowa Test of Basic Skills. They then subtracted each school’s SES rank from its
achievement rank. A positive difference score indicated that the school’s achievement was better
than expected given its SES composition, while a negative difference indicated the opposite.
The 22 schools with the most positive difference score were compared against the 22 schools
with the most negative difference score on a teacher knowledge of reading instruction scale,
years of teaching experience, 27 separate variables for each items of a scale measuring teacher
attitudes, toward reading instruction, and another 38 variables. The data were analyzed via a
series of 67 unadjusted t-tests. The results indicated that the two groups of schools did not differ
significantly on either teacher knowledge of reading or teaching experience.
Student body characteristics. Opdenakker and Van Damme’s (2001) three-level HLM
study of math achievement reached some interesting conclusions regarding the aggregate effects
of student ability on achievement. The first is that school level math ability was a significant,
powerful, and positive contributor toward individual achievement. Furthermore, a cross level
interaction was detected between individual and student ability where students within schools
with higher average numerical ability gain more achievement from each unit increase in their
ability. The authors stated that “We found that all students benefit from belonging to a school
with a high ability composition, but the more able students benefited the most” (p. 423). Perhaps
these results can be understood in light of Kulik and Kulik’s (1992) meta-analysis of the
effectiveness of various grouping options for gifted students, which found in part that grouping
was effective when it allowed students to receive advanced instruction. Opdenakker and Van
Damme (2001) proposed some mechanisms through which school ability could affect individual
achievement. They argued that bright and motivated students may exert peer pressure on other
32
students to achieve, that the curriculum may be taught more demandingly and with higher
standards to groups with higher academic readiness, that teachers likely expect more from
classes that they consider to be highly able, and that bright students may benefit from enhanced
academic self-concept. Stated more informally, if teachers “teach to the average,” the level of
teaching goes up for everyone when the average student ability is increased.
The findings became even more interesting when Opdenakker and Van Damme (2001)
performed a similar analysis, but this time replacing student ability with a student ability by
student SES interaction which was interacted with school mean ability as before (see Figure 1.1).
Relationship betw een student numerical intelligence and sensitivity to school mean numerical intelligence per SES-group (reproduced from
Opdenakker & Van Damme, 2001, p. 432)
0
10
20
30
40
50
64 104 144Numerical intelligence
Sen
siti
vity
to
sch
oo
l ab
ility
Low SES
Medium SES
High SES
Figure 1.1: Relationship between student numerical intelligence and sensitivity to school mean numerical intelligence per SES-group
The results indicated that high ability students across SES strata are most sensitive to the ability
composition of their schools, and that this effect is strongest for high ability students from low
33
SES backgrounds. Indeed, these bright and economically disadvantaged students were about
twice as sensitive to school composition than students of similar ability from most advantaged
backgrounds. These results support and extend previous findings that students from poor
backgrounds are the most sensitive to school effects (e.g., Entwisle and Alexander, 1992).
Critique of the Literature
The studies summarized in this report suffer from some common problems that weaken
their value and ultimately, their trustworthiness to the research community. Understanding these
issues is important so that future studies in the area may address or avoid them. In some cases,
researchers simply made common mistakes in their analysis, such as not accounting for
confounding variables, relying upon univariate statistics when multivariate statistics were more
appropriate, categorizing continuous variables, and bad reporting practices. In other studies, the
most appropriate statistical methods or computer programs were not available at the time of
publication.
Ignoring clustered data. Most research in education is guilty of ignoring the clustered
nature of the data. Studies in education most often examine students who are clustered within
classrooms which are clustered within schools. Ordinary statistical procedures based on the
generalized linear model, such as analysis of variance and regression, assume that the units of
analysis are independent – that they cannot affect each other. When students are clustered within
classrooms, the students within the class are more similar to other students in the same class than
to students in other classes. Clustered data often violates the assumption of independence and
thus results in biased parameter estimates (Raudenbush & Bryk, 2002). Aggregation bias is
another common problem resulting from ignoring the clustered aspect of data. Aggregation bias
results when a predictor exerts different types of influence at different levels. An example
34
discussed earlier was that of SES, which exerts certain effects at the individual level. However,
organizational units such as classrooms are composed of individuals with their own SES.
Therefore, organizational units have SES properties of their own that may exert influence of a
different magnitude or direction than the effects of SES at the individual level. Flat regression
models ignore the higher-level effects of predictors (Raudenbush & Bryk, 2002). Statistical
techniques, such as hierarchical linear modeling and multilevel structural equation modeling, that
are capable of dealing with this type of clustered data are relatively new in education.
Confounding issues. When two predictors are highly correlated, research examining the
impact of one of these predictors must make an effort to control the other. Otherwise, the
individual variable in question cannot be isolated. An example that affects many of the studies
examined herein is the strong relationship that exists between SES and race. Ethnic and cultural
groups in the United States continue to lack equal access to sources of economic power, so
substantial differences in average SES persist across racial lines. Studies that include one of
these factors without controlling the other are impossible to interpret.
Categorizing continuous variables. It is very common for researchers in education and
other areas of psychology to inappropriately categorize continuous variables using median splits
or similar procedures. This is most frequently performed in order to force data to fit into an
ANOVA framework when a regression framework would be more appropriate. A great deal of
the research examined in this review divided participants into low, middle, and high SES groups
based on splitting up continuous information on family income, parental education, or parental
occupation status (e.g., Kaltsounis, 1974; Quay, 1989; Ryan & French, 1976). When continuous
data are artificially categorized, statistical power is reduced. Such treatments may also mask
nonlinear relationships.
35
Inappropriate reliance on univariate analysis. Several of the empirical research articles
in this review had access to multiple outcome variables, such as achievement test scores in
multiple subjects. When a researcher is interested in multiple outcomes and has reason to
believe that the outcomes are theoretically related, it is generally more appropriate to utilize a
multivariate analysis that treats the outcomes as a structured composite (Huberty, 1994). This
allows for increased statistical power as well as follow-up descriptive discriminant analysis
techniques to explore constructs within the outcome composite that may be differentially
affected by the predictors.
Ignoring the structure of predictors. The common regression model assumes that each
predictor causes the outcome directly, while being correlated with the other predictors. In other
words, the model assumes that the predictors themselves are not structured. Reality is usually
more complicated. Predictors may exert causative effects on each other as well as on the
outcome variable of interest. Therefore, a given predictor of interest may exert a direct effect on
an outcome while exerting other indirect effects on the outcome through the other predictors.
Path analysis allows the researcher to posit models of how predictors influence each other as
well as the outcome. Multiple path models can be compared against a data set, allowing the
researcher to compare their relative fit and discard theoretical models that do not account for
relations between variables. Marsh (1984) and Marsh and Parker (1984) were the only articles
reviewed to use path analysis, and they did not report model fit statistics.
Inappropriate use of stepwise procedures. Stepwise procedures are popular in multiple
regression and predictive discriminant analysis contexts for automating the selection of
predictors and ordering their entry into the model. The issue of ordering predictors is
particularly crucial because once a predictor is entered into a model its effect is controlled. In
36
other words, all of the covariance that that predictor shares with another predictor as well as the
outcome is attributed to the variable that is entered first. Unfortunately, stepwise procedures do
not perform either of their intended functions very well. Thompson (1995) showed that stepwise
procedures do not result in the selection of the best subset of predictors and often fail to yield
replicable results due to their capitalization on sampling error for determining variable entry
order. This was illustrated quite clearly in West’s (1985) analyses. In the first analysis, of
reading achievement, the percentage of the school population that was Black was entered first,
followed by SES. In the second analysis, this time of math achievement, SES was entered first.
The percentage of the school population that is Black did not account for any additional variance
once SES was in the model. Of course, this begs the question of whether the percentage of
students that are Black would have made it into the first model if SES had been entered first.
Directions for Future Research
It is apparent from reading this review that much of the research examining the impact of
SES on educational outcomes is quite dated. Much of the relatively recent work is still using the
National Educational Longitudinal Study of 1988 dataset, which is rapidly approaching
obsolescence, or even older datasets such as the High School and Beyond data of the early
eighties.
Future research on the effects of SES on school achievement should be conducted with
up-to-date datasets using analysis methods that account for the multilevel nature of the data.
This would address the problems posed by aggregation bias. Using structural equation modeling
procedures would allow for examination of the structure of the predictors and would begin to
integrate the structure and process approaches that have historically been separate in SES
37
research. Care must be taken to examine the effects of socioeconomic status independent of
many of the factors that have confounded previous work, such as race.
Conclusion
This essay has reviewed a variety of literature related to gifted identification and school
achievement. We have seen overwhelming evidence that socioeconomic status has powerful
effects at both the individual and aggregate level, which must be considered in future research
within gifted education. We have seen that many scholars in gifted education are concerned with
the underrepresentation issue, but that the true effects of race are difficult to disentangle from
class.
38
References
Babaeva, J. D. (1999). A dynamic approach to giftedness: Theory and practice. High Ability
Studies, 10(1), 51-68.
Bennett, C. I. (1995). Comprehensive multicultural education: Theory and practice. (4th ed.).
Boston: Allyn & Bacon.
Bolig, E. E., & Day, J. D. (1993). Dynamic assessment and giftedness: The promise of assessing
training responsiveness. Roeper Review, 16(2), 110-113.
Brosnan, F. L. (1983). Overrepresentation of low-socioeconomic minority students in special
education programs in California. Learning Disability Quarterly, 6(4), 517-525.
Brown, C. N. (1997). Gifted identification as a constitutional issue. Roeper Review, 19(3), 157-
167.
Chasteen, A., Bhattacharyya, S., Horhota, M., Tam, R., & Hasher, L. (2005). How feelings of
stereotype threat influence older adults’ memory performance. Experimental Aging
Research, 31(3), 235-260.
Cicerelli, V. G. (1966). Religious affiliation, socio-economic status, and creativity. Journal of
Experimental Education, 35, 90-93.
Cordeiro, P. (1991). An ethnography of high achieving at-risk Hispanic youths at two urban high
schools: Implications of administrators. Connecticut. (ERIC Document Reproduction
Service No. ED 330 088).
Duncan, G. J., & Brooks-Gunn, J. (2000). Family poverty, welfare reform, and child
development. Child Development, 71, 188-196.
39
Entwisle, D. R., & Alexander, K. L. (1992). Summer setback: Race, poverty, school
composition, and mathematics achievement in the first two years of school. American
Sociological Review, 57(1), 72-84.
Everson, H. T., & Millsap, R. E. (2004). Beyond individual differences: Exploring school effects
on SAT scores. Educational Psychologist, 39(3), 157-172.
Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7, 117-140.
Ford, D. Y. (1998). The underrepresentation of minority students in gifted education: Problems
and promises in recruitment and retention. Journal of Special Education, 32(1), 4-14.
Ford, D. Y. & Harris, J. J. III. (1999). Multicultural gifted education. New York: Teachers
College Press.
Ford, D. Y., Harris, J. J., III, Tyson, C. A., & Trotman, M. F. (2002). Beyond deficit thinking:
Providing access for gifted African American students. Roeper Review, 24(2), 52-58.
Fordham, S. (1988). Racelessness as a factor in black students' school successes: Pragmatic
strategy or Pyrrhic victory? Harvard Educational Review, 5(8), 54-84.
Fordham, S., & Ogbu, J. (1986). Black students' school success: The burden of "acting White."
The Urban Review, 18, 176-206.
Forman, S. (1979). Effects of socioeconomic status on creativity in elementary school children.
Creative Child and Adult Quarterly, 4(2), 87-92.
Frasier, M. M. (1997). Gifted minority students: Reframing approaches to their identification and
education. In N. Colangelo & G. Davis (Eds.), The handbook of gifted education (2nd
ed., pp. 498-515). Needham Heights, MA: Allyn & Bacon.
Frasier, M. M., & Passow, A. H. (1994). Toward a new paradigm for identifying talent potential.
(No. 94112): The National Research Center on the Gifted and Talented.
40
Gagne, F. (1994). Are Teachers Really Poor Talent Detectors? Comments on Pegnato and
Birch's (1959) Study of the Effectiveness and Efficiency of Various Identification
Techniques. Gifted Child Quarterly, 38(3), 124-126.
Gardner, H. (1993). Frames of mind: The theory of multiple intelligences. NY: Basic Books.
Garibaldi, A. M. (1997). Four decades of progress and decline: An assessment of African
American educational attainment. Journal of Negro Education, 66(2), 105-120.
Grantham, T. C., & Ford, D. Y. (2003). Beyond self-concept and self-esteem: Racial identity and
gifted African American students. High School Journal, 87(1), 18-29.
Griffith, J. (1996). Relation of parental involvement, empowerment, and school traits to student
academic performance. Journal of Educational Research, 90(1), 33-41.
Gross, M. U. M. (1989). The pursuit of excellence or the search for intimacy? The forced-choice
dilemma of gifted youth. Roeper Review, 11(4), 189-194.
Haley, G. (1984). Creative response styles: The effects of socioeconomic status and problem-
solving training. Journal of Creative Behavior, 18(1), 25-40.
Harmon, D. (2002). They won't teach me: The voices of gifted African American inner-city
students. Roeper Review, v24 n2 p68-75 Win 2002, 24(2).
Hébert, T., & Reis, S. (1999). Culturally diverse high-achieving students in an urban high school.
Urban Education, 34(4), 428-457.
Hertzman, C., McLean, S., Kohen, D., Dunn, J., & Evans, T. (2002). Early development in
Vancouver: Report of the community asset mapping project (CAMP). Ottowa, Ontario:
Canadian Institute for Health Information.
Hess, R., & McDevitt, T. (1984). Some cognitive consequences of maternal intervention
techniques: A longitudinal study. Child Development, 55, 1902-1912.
41
Hollingshead, A. B., & Redlich, F. C. (1958). Social class and mental illness. New York: Wiley.
Kaltsounis, B. (1974). Race, socioeconomic status, and creativity. Psychological Reports, 35,
164-166.
Kennedy, E. (1992). A multilevel study of elementary male Black students and White students.
Journal of Educational Research, 86(2), 105-110.
Kirschenbaum, R. (1998). Dynamic assessment and its use with underserved gifted and talented
populations. Gifted Child Quarterly, 42(3), 140-147.
Koenig, A. & Eagly, A. (2005). Stereotype threat in men on a test of social sensitivity. Sex Roles,
52(7-8), 489-496.
Kulik, J. A., & Kulik, C.-l. C. (1992). Meta-analytic findings on grouping programs. Gifted Child
Quarterly, 36(2), 73-77.
Laosa, L. M. (1983). School, occupation, culture, and family: The impact of parental schooling
on the parent - child relationship. In I. E. Sigel & L. M. Laosa (Eds.), Changing families
(pp. 79-135). New York: Plenum.
Lidz, C. & Macrine, S. (2001). An alternative approach to the identification of gifted culturally
and linguistically diverse learners: The contribution of dynamic assessment. School
Psychology International, 22(1), 74-96.
Lohman, D. F. (2005). Review of Naglieri and Ford (2003): Does the Naglieri Nonverbal Ability
Test identify equal proportions of high-scoring White, Black, and Hispanic students?
Gifted Child Quarterly, 49, 19-28.
Luther, M., & Wyatt, F. (1989). A comparison of Feuerstein's method of LPAD assessment with
conventional IQ testing on disadvantaged New York high school students. International
Journal of Dynamic Assessment and Instruction, 1(1), 49-64.
42
Maggi, S., Hertzman, C., Kohen, D., & D'Angiulli, A. (2004). Effects of neighborhood
socioeconomic characteristics and class composition on highly competent children.
Journal of Educational Research, 98(2), 109-114.
Maker, C. J. (1996). Identification of gifted minority students: A national problem, needed
changes and a promising solution. Gifted Child Quarterly, 40(1), 41-50.
Maker, C. J., Nielson, A. B., & Rogers, J. A. (1994). Giftedness, diversity, and problem-solving.
Teaching Exceptional Children, 27, 4-19.
Marsh, H. W. (1984). Self-concept, social comparison, and ability grouping: A reply to Kulik
and Kulik. American Educational Research Journal, 21(4), 799-806.
Marsh, H. W., & Parker, J. W. (1984). Determinants of student self-concept: Is it better to be a
relatively large fish in a small pond even if you don't learn to swim as well? Journal of
Personality & Social Psychology, 47(1), 213-231.
Marsh, H. W., Relich, J. D., & Smith, I. D. (1983). Self-concept: The construct validity of
interpretations based on the SDQ. Journal of Personality & Social Psychology (45), 173-
187.
Marx, D. & Goff, P. (2005). The effect of experimenter race on target’s test performance and
subjective experience. British Journal of Social Psychology, 44(4), 645-657.
McBee, M. (In press). Minority representation in gifted programs: A school level analysis of race
and socioeconomics. Roeper Review.
McLoyd, V. C. (1998). Economic advantage and child development. American Psychologist, 53,
185-204.
43
Miller, J. W., Ellsworth, R., & Howell, J. (1986). Public elementary schools which deviate from
the traditional SES-achievement relationship. Educational Research Quarterly, 10(3), 31-
50.
Mills, B. C. (1983). The effects of socioeconomic status on young children's readiness for
school. Early Child Development & Care, 11(3), 267-273.
Mills, C. J., & Tissot, S. L. (1995). Identifying academic potential in students from under-
represented populations: Is using the Ravens Rrogressive Matrices a good idea? Gifted
Child Quarterly, 39(4), 209-217.
Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children
using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47(2), 155-
160.
Naglieri, J., & Jensen, A. R. (1987). Comparison of Black - White differences on the WISC-R
and the K-ABC: Spearman's hypothesis. Intelligence, 11, 21-43.
Opdenakker, M.-C., & Van Damme, J. (2001). Relationship between school composition and
characteristics of school process and their effect on mathematics achievement. British
Educational Research Journal, 27(4), 407-432.
Overton, W. F., Wagner, J., & Dolinsky, H. (1971). Social class differences and task variables in
the development of multiplicative classification. Child Development, 42, 1951-1958.
Portes, A., & MacLeod, D. (1996). Educational progress of children of immigrants: The roles of
class, ethnicity, and school context. Sociology of Education, 69(4), 255-275.
Pyryt, M. (1996). IQ: Easy to bash, hard to replace. Roeper Review, 18, 255-258.
Quay, L. C. (1989). Interactions of stimulus materials, age, and SES in the assessment of
cognitive abilities. Journal of Applied Developmental Psychology, 10(3), 401-409.
44
Raudenbush, S., & Bryk, A. (1986). A hierarchical model for studying school effects. Sociology
of Education, 59, 1-17.
Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis
methods. (2nd ed.). Thousand Oaks, CA: Sage.
Reid, C., Romanoff, B., Algozzine, B., & Udall, A. (2000). An evaluation of alternative
screening procedures. Journal for the Education of the Gifted, 23(4), 379-396.
Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American
Sociological Review, 15, 351-357.
Rogers, D. W. (1968). Visual expression: A creative advantage of the disadvantaged. The
Elementary School Journal, 68, 394-399.
Rumberger, R. W. (1995). Dropping out of middle school: A multilevel analysis of students and
schools. American Educational Research Journal, 32(3), 583-625.
Ryan, J. J., & French, J. R. (1976). Long-term grade predictions for intelligence and achievement
tests in schools of differing socio-economic levels. Educational & Psychological
Measurement, 36(2), 553-559.
Sarouphim, K. M. (1999). DISCOVER: A promising alternative assessment for the identification
of gifted minorities. Gifted Child Quarterly, 43(4), 244-251.
Sarouphim, K. (2001). Concurrent validity, gender differences, and identification of minority
students. Gifted Child Quarterly, 45, 130-138.
Saurophim, K. (2002). DISCOVER in high school: Identifying gifted Hispanic and Native
American students. Journal of Secondary Gifted Education, 14(1), 30-38.
45
Schmader, T., Johns, M., & Barquissau, M. (2004). The costs of accepting gender differences:
The role of stereotype endorsement in women's experience in the math domain. Sex
Roles, 50(11), 835-850.
Scott, M. S., Perou, R., Urbano, R. C., Hogan, A., & et al. (1992). The identification of
giftedness: A comparison of White, Hispanic and Black families. Gifted Child Quarterly,
36(3), 131-139.
Siegle, D. (2001, April 18-21). Teacher bias in identifying gifted and talented students. Paper
presented at the Annual meeting of the Council for Exceptional Children, Kansas City,
MO.
Shaunessy, E., Karnes, F. A., & Cobb, Y. (2004). Assessing potentially gifted students from
lower socioeconomic status with nonverbal measures of intelligence. Perceptual & Motor
Skills, 98(3), 1129-1138.
Shavelson, R. J., & Bolus, R. (1982). Self-concept: The interplay of theory and methods. Journal
of Educational Psychology (74), 3-17.
Smith, J. L. (2004). Understanding the process of stereotype threat: A review of mediational
variables and new performance goal directions. Educational Psychology Review, 16(3),
177-206.
Soares, A. T., & Soares, L. M. (1969). Self-perceptions of culturally disadvantaged children.
American Educational Research Journal, 6, 31-45.
Spearman, C. (1904). General intelligence: Objectively determined and measured. American
Journal of Psychology, 15, 201-293.
Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and
performance. American Psychologist, 52(6), 613-629.
46
Steinberg, L., Blinde, P., & Chan, K. (1984). Dropping out among language minority youth.
Review of Educational Research, 54(1), 113-132.
Taylor, S. A., & Harris, K. C. (2003). School integration and the achievement test scores of
Black and White students in Savannah, Georgia. North American Journal of Psychology,
5(2), 301-309.
Thompson, B. (1995). Stepwise regression and stepwise discriminant analysis need not apply
here: A guidelines editorial. Educational & Psychological Measurement, 55(4), 525-534.
Torrance, E. P. (1977). Discovery and nurturance of giftedness in the culturally different.
Reston, VA: The Council for Exceptional Children.
Trowbridge, N. (1972). Self-concept and socio-economic status in elementary school children.
American Educational Research Journal, 9, 525-537.
Tyler-Wood, T., & Carri, L. (1993). Verbal measures of cognitive ability: The gifted low SES
student's albatross. Roeper Review, 16(2), 102-106.
Valenzuela, A. (1999). Subtractive schooling: U.S. -Mexican youth and the politics of
caring.Albany, NY: SUNY Press.
VanTassel-Baska, J., Johnson, D., & Avery, L. D. (2002). Using performance tasks in the
identification of economically disadvantaged and minority gifted learners: Findings from
project STAR. Gifted Child Quarterly, 46(2), 110-123.
West, C. A. (1985). Effects of school climate and school social structure on student academic
achievement in selected urban elementary schools. Journal of Negro Education, 54(3),
451-461.
47
Wilson, N. S. (1986). Counselor interventions with low-achieving and underachieving
elementary, middle, and high school students: A review of the literature. Journal of
Counseling & Development, 64(10), 628-634.
Wylie, R. C. (1979). The self-concept: Theory and research on selected topics. (Vol. 2). Lincoln:
University of Nebraska Press.
Yopyk, D. & Prentice, D. (2005). Am I an athlete or a students? Identity salience and stereotype
threat in student-athletes. Basic and Applied Social Psychology, 27(4), 329-336.
48
CHAPTER 2
A DESCRIPTIVE ANALYSIS OF REFERRAL SOURCES FOR GIFTED IDENTIFICATION
SCREENING BY RACE AND SOCIOECONOMIC STATUS1
1 McBee, M. Accepted by the Journal for Secondary Gifted Education, 2/16/2006.
49
Abstract
A dataset containing demographic, gifted nomination status, and gifted identification
status for all elementary school students in the state of Georgia (N = 705,074) was examined.
The results indicated that automatic and teacher referrals were much more valuable than other
referral sources. Asian and White students were much more likely to be nominated than Black
or Hispanic students. Students receiving free or reduced-price lunches were much less likely to
be nominated than students paying for their own lunches. The results suggest that inequalities in
nomination, rather than assessment, may be the primary source of the underrepresentation of
minority and low-SES students in gifted programs.
50
Despite the vital role of the referral as the “gatekeeper” process through which students
become eligible for official evaluation for entry into gifted programs, it remains poorly
understood. An examination of the gifted education literature reveals a paucity of research in
this area. This is especially troubling and indeed surprising given the field’s well-documented
struggle to identify and serve students from minority or low SES families (e.g., Ford, 1998;
Frasier, Garcia, & Passow, 1995). A relatively large amount of work has examined possible
methods of fairly assessing students who are traditionally underrepresented in programs for the
gifted, including assessment schemes based on dynamic assessment (Kirschenbaum, 2004), non-
verbal ability tests (Naglieri & Ford, 2003), Gardner’s (1983) theory of multiple intelligences
(Sarouphim, 1999), compensatory policies such as lowering IQ cutoff requirements for students
from underrepresented groups (Hunsaker, 1994), and performance-based assessments
(VanTassel-Baska, Johnson, & Avery, 2002). These procedures may hold great promise for
identifying and serving students from these groups. However, most school districts require that a
student be referred or nominated before being formally assessed for gifted program placement.
Students that do not receive a referral will be unable to enter the program no matter which formal
assessment procedure is used. The referral process is an obvious potential source of unfairness
in the entrance process. It is essential that reliable information be made available so that current
practices can be evaluated and perhaps modified.
For the remainder of this paper, the terms “referral” and “nomination” will be used
interchangeably to describe the process of designating a student as potentially gifted. Once a
student has received a nomination or referral, he or she is legally required to undergo official
51
testing for gifted program placement, assuming that the student’s parents consent. The testing
process will be referred to as “evaluation” or “screening” throughout the remainder of the paper.
Teacher Nominations
The classic study on teacher nominations was conducted by Pegnato and Birch in 1959.
In this study, a variety of screening methods were compared on the basis of “effectiveness” , the
percentage of gifted children nominated by the screening method, and “efficiency” , the
percentage of nominated students that would later be confirmed as gifted through individual
testing. “Giftedness” was operationalized as an IQ score of 136 or greater on the Stanford-Binet.
Therefore, effectiveness was sensitive to false negatives while efficiency was sensitive to false
positives. Pegnato and Birch concluded that teacher judgment was a poor method of screening
students for individual testing. Teacher judgment was just 45% effective, meaning that teachers
only nominated 45% of students that actually had IQs greater than 136, and was only 27%
efficient. Their study, widely acclaimed in the gifted education community, formed the basis of
a widespread belief that teachers are poor judges of student potential. Their method of assessing
screening techniques via effectiveness and efficiency ratings were utilized in much of the later
research on teacher nominations (i.e., Gear, 1976; Waters & Clausen, 1983).
Gagne (1994) re-examined Pegnato and Birch’s (1959) study. He severely criticized the
use of effectiveness and efficiency measures in assessing the quality of a nomination scheme,
pointing out the two are non-independent because they both depend upon the number of students
nominated. In fact, the two indices are negatively correlated. A screening method that
nominates more students will, all things being equal, be more effective since it will necessarily
catch more gifted students while simultaneously becoming less efficient. Gagné argued for the
use of the phi coefficient in judging the effectiveness of a nomination scheme. The phi
52
coefficient is a correlation coefficient used with categorical data whose interpretation is
equivalent to that of Pearson’s r (Agresti, 1996). To use the phi coefficient, a 2x2 cross-
classification table is created with nomination status (yes/no) on one dimension and gifted status
(yes/no) on the other. Counts from each set of four conditions are placed on the table. The
number of counts on the diagonal, or correctly classified cases, is compared to the total number
of counts. Using this type of procedure to assess the quality of teacher nominations as a
screening strategy does not suffer from the drawbacks of Pegnato and Birch’s system. Gagné’s
analysis of the original data found that teacher judgment had a phi coefficient of .29, which
compared quite favorably with the other methods analyzed in the study. Thus, the belief that
teachers are generally poor at detecting academically gifted students is based partly on a classic
study with flawed methodology.
Another concern with respect to research on the efficacy of teacher nominations is the
criterion variable. Just what should the criterion be? Previous definitions of giftedness that
relied on an IQ score higher than a specific threshold were quite simple to test via the cross-
classification approach outlined above, as they simply asked the teacher to predict which
students would exceed the target IQ. Indeed, Renzulli and Delcourt (1986) criticized this
teacher-predicts-IQ approach, arguing that their imperfect ability to do this should suggest that
they are valuable sources of information on student ability that differs from that which is
measured by psychometric testing. Current multidimensional definitions of giftedness (see
Feldman, 2003) that define it as some combination of academic ability, creativity, motivation,
achievement, leadership, or artistic talent make the selection of an appropriate criterion variable
quite difficult. Renzulli and Delcourt suggested that the ultimate criterion for evaluating the
53
usefulness of teacher recommendations should be performance in the enriched academic
program or even later life accomplishment.
Ultimately, insufficient research has been conducted on teacher nominations to make
possible a sound judgment regarding their value. But even if teachers are effective at nominating
students from middle-class majority-culture backgrounds, as some more contemporary research
suggests, a significant question remains regarding their ability to detect students with high
academic potential who come from other backgrounds, especially those backgrounds that are
underrepresented in programs for gifted students. A reading of the research literature on this
topic reveals that it has been a frequent source of concern. Nonetheless, only a small number of
studies have empirically examined this issue.
Hunsaker, Finley, and Frank’s (1997) study is one of the few that has addressed Renzulli
and Delcourt’s (1986) criticism. Teachers were trained to recognize the characteristics of
giftedness as they manifest in students from traditionally underrepresented backgrounds. The
researchers examined canonical correlations between teacher ratings on the TABs Summary
Form (Frasier et al., 1995) and the Scales for Rating the Behavioral Characteristics of Superior
Students (Renzulli, Smith, White, Callahan, & Hartman, 1976) for students from low-income and
minority backgrounds and subsequent student performance in the gifted program, as assessed by
the Scale for Rating Students’ Participation in the Local Gifted Education Program (Renzulli &
Westberg, 1991). The results indicated that the teacher ratings of the students’ characteristics
were moderately correlated with specific aspects of the students’ subsequent performances in
gifted education classes. However, the correlations between the “overall success” scale of the
student performance scale with the two canonical variables representing teacher evaluations of
gifted characteristics were quite low (.178 and .220).
54
There is some evidence suggesting that teachers evaluate Hispanic students less favorably
than White students. Masten, Plata, Wenglar, and Thedford’s (1999) study found that fifth-
grade teachers rated Hispanic students less favorably on the Scales for Rating the Behavioral
Characteristics of Superior Students (SRBCSS) and that their ratings of students were associated
with the students’ level of acculturation and ethnic identification. A similar study, conducted by
Plata and Masten (1998), concluded that nominated Hispanic and Caucasian students had similar
scores on the SRBCSS, but non-nominated Hispanic students did have lower scores than their
Caucasian counterparts. However, since these studies did not control for socioeconomic status
or any other potentially lurking variables, they must be interpreted with caution.
Method
Data Sources
A population dataset was obtained from the Georgia Department of Education via special
request. This dataset included records from all public school students enrolled during the 2004
year. The relevant variables from this dataset that were used in this analysis included the
student’s race, whether or not the student received free or reduced price lunch, whether the
student had been nominated for participation in the gifted program, the source of the nomination,
and whether or not the student had been identified. The overall N for the dataset was 1,820,635.
Of these, all students in grades 1 through 5 were selected. This yielded an N of 705,074, the
population of Georgia elementary school students during the 2004 academic year.
The nomination sources reported in the data were as follows: automatic referrals which
occur automatically when a student scores in the 90th percentile or higher on a standardized test,
teacher referrals, parent referrals, self referrals, peer referrals, and other referral sources, which
are referrals communicated to the school by anyone other than the student’s teacher, parent, self,
55
or peer. Examples of other referrals would include referrals by a community member, minister,
or relative without custody of the child.
Georgia follows a multiple-criteria assessment procedure. Once students have been
nominated, they are evaluated or screened for gifted program placement. Data must be collected
in four areas: mental ability, achievement, motivation, and creativity. Mental ability is generally
determined via psychometric assessment, achievement is generally determined by standardized
test scores, creativity is generally determined by the Torrance Test of Creative Thinking –
Figural, and motivation is generally determined by grades. However, a variety of other forms of
evidence are admissible, including projects or performances that are evaluated by a panel of
judges. To be identified as gifted, students must either provide evidence of superior ability in
any three of these four domains or must provide evidence of superior ability and achievement.
Evidence for superiority in at least one of the four areas must be provided by a standardized test.
Research Questions
This study addresses the following research questions: 1) How do the referral sources
compare in terms of overall quality, as indicated by the phi coefficient, as well as by the number
of students referred, the proportion of referred students who are successfully identified, and the
proportion of identified students located via the referral method? 2) How do the referral sources
compare in terms of equity across racial and socioeconomic groups? 3) Does the
underrepresentation problem occur primarily at the nomination stage or the testing stage of the
gifted identification process?
Prior to conducting the analysis, the data were screened and prepared. The data were
originally collected from each school by individual teachers who were responsible for reporting
the referral sources. Some teachers apparently misreported students with automatic referrals as
56
not having been referred at all. This resulted in some students being reported as having been
identified as gifted without being referred, which should obviously be impossible. After
conversing with personnel at the Georgia Department of Education, gifted students coded as not
being referred were recoded to automatic referral. Furthermore, there were a small number of
students with missing data on whether or not they received free or reduced price lunch. Those
cases were excluded from the relevant analyses.
Results
The overall composition of Georgia elementary schools by race, SES, and gifted status is
described in Table 2.1. From this table, it is quite obvious that students from different racial
backgrounds are not equally represented in gifted programs. Furthermore, student SES as
indexed by whether or not the student received free or reduced-price lunch (FRL) is strongly
related to the proportion of students that participate in gifted programs. Hereafter, the term “low
SES” refers to students who are receiving lunch aid while “high SES” refers to students who are
not receiving lunch aid.
In the initial analysis, referral sources for the overall student population were examined.
The results of this analysis are presented in Table 2.2. Almost 10 percent of students had been
referred, and 80.3 percent of those students were subsequently identified. Automatic referrals
had the highest validity as indicated by the phi coefficient. Automatic referrals were also the
most common referral source and had the highest accuracy. Teacher referrals made up the
majority of the remaining referral sources, which also had an acceptable phi coefficient as well
as a high accuracy. Parent and other referrals had similar occurrence frequencies, accuracies,
and phi coefficients. Self and peer referrals were very rare, had the lowest phi coefficients, and
were the least accurate.
57
Table 2.1: Identified elementary school students by race and SES
______________________________________________________________________________ Race SES N Students N Gifted Percentage Gifted ______________________________________________________________________________ Overall Overall 705,074 55,856 7.9
Low 348,529 10,126 2.9 High 354,364 45,560 12.9
Asian Overall 17,587 3,215 18.3 Low 5,611 530 9.4 High 11,963 2,684 22.4 Black Overall 275,821 8,695 3.2 Low 191,193 4,146 2.2 High 83,376 4,504 5.4 Hispanic Overall 59,398 1,389 2.3 Low 45,057 783 1.7 High 14,309 606 4.2 Native Overall 984 101 10.3 Low 436 23 5.3 High 546 78 14.3 White Overall 333,569 41,005 12.3 Low 97,527 4,267 4.4 High 235,183 36,615 15.6 ______________________________________________________________________________
It is important to note that, in general, the automatic referral process happens first. As
each student can only receive one nomination, this advantages automatic referrals over the other
referral sources. Many students receiving automatic referrals would no doubt have received
referrals from other sources. Furthermore, because one of the four assessment categories that
nominated students must satisfy to gain program entry is satisfied by superior achievement test
scores, students receiving automatic referrals have already fulfilled one of the three criteria for
gaining entrance into the program.
58
Table 2.2: Overall comparison of referral sources (N = 705,074) ______________________________________________________________________________ Source Percentage Success Percentage Phi referred rate identified ______________________________________________________________________________ All sources 9.9 80.3 100.0 Automatic 5.2 86.3 57.1 .682 Teacher 4.0 74.9 37.7 .505 Parent 0.4 59.2 3.0 .120 Self 0.01 44.2 .03 .010 Peer <0.01 46.2 .01 .006 Other 0.3 77.4 2.2 .123 ______________________________________________________________________________
For the next analysis, the data file was split by FRL status before being analyzed. Results
are presented in Table 2.3. The overall relationship between student SES and gifted program
nominations is very clear. Students who did not receive financial assistance were over three
times more likely to be referred than students receiving FRL. The overall accuracy of referrals
was also higher for the students who paid for their own lunches.
Paid lunch students received over four times as many automatic referrals as FRL students
and over three times as many teacher referrals. The accuracy of all referral sources except peer
referrals was higher for the paid lunch students. This is also reflected in the phi coefficients for
each source. Interestingly, teacher referrals had nearly identical phi coefficients for both groups.
The value of the phi coefficient is dependent upon both the accuracy of the referral source as
well as the proportion of identified students that were referred via that source. Though the
accuracy of teacher referrals is somewhat lower for low-SES students, more low-SES students
are identified via teacher nominations, resulting in the slightly higher phi coefficients for that
group.
59
Though parent and other referrals were rare in both groups, they were more frequent and
more accurate in the high SES group. Proportionally, high SES students received over four
times as many parent referrals and over twenty four times as many other referrals.
Table 2.3: Comparison of referral sources by SES ______________________________________________________________________________ Source Percentage Success Percentage Phi referred rate identified ______________________________________________________________________________ Free or reduced lunch (n = 348,529) All sources 4.15 70.06 100.00 Automatic 1.93 79.27 52.66 .638 Teacher 1.95 62.84 42.16 .503 Parent .13 53.60 2.10 .094 Self <.01 27.27 .03 .008 Peer <.01 50.00 <.01 .007 Other .13 67.39 3.06 .139 Paid lunch (n = 354,364)
All sources 15.49 83.00 100.00 Automatic 8.49 87.82 54.81 .682
Teacher 6.01 78.68 38.80 .497 Parent .66 61.76 3.17 .119 Self <.01 50.00 .04 .011 Peer <.01 45.45 .01 .005 Other 3.21 81.46 2.03 .116 ______________________________________________________________________________
For analysis three, the data file was split by student race. The results of this analysis can
be found in Table 2.4. Very pronounced differences in nomination frequency are evident across
races, with almost 25 percent of Asian students receiving a nomination while only about 3
percent of Hispanic students received a nomination. Furthermore, automatic referrals remain the
nomination method with the highest phi coefficients and the highest accuracies, except for
60
Native students where teacher nominations are the most accurate. Again, this is probably due to
automatic referrals coming first in the referral timeline.
Teacher nominations showed evidence of better performance for Asian, White, and
Native students than for Hispanic and Black students. Furthermore, the quality of teacher
nominations for Black students is especially poor in terms of the phi coefficient and accuracy.
Self and peer referrals continue to be rare and of poor quality. There were no peer referrals for
Asian, Hispanic, and Native students, so phi could not be calculated for these groups. The
proportionality of parent nominations varied across racial groups as well. Asian, Native, and
White students had much higher rates of parent nomination than Black and Hispanic students.
In the final analysis, the data file was split by race and SES. The results of this analysis
may be found in Table 2.5. A few patterns deserve mentioning. Automatic referrals performed
well in all groups. In general, automatic referrals performed better in high SES students than low
SES students in terms of both phi and accuracy except for Asian and Native students, where
automatic referrals had higher phi coefficients in low SES students.
Phi coefficients for teacher nominations were higher in low SES students than in high
SES students. However, the accuracy of teacher nominations was higher for high SES students.
The larger phi coefficient values in low SES students result from the fact that a larger proportion
of low SES students are identified via teacher nominations. Though parent nominations were
rare, they were much more frequent in high SES groups.
61
Table 2.4: Comparison of referral sources by race ______________________________________________________________________________ Race Source Percentage Success Percentage Phi referred rate identified
______________________________________________________________________________ Asian (n = 17,587) All sources 23.02 79.42 100.00 Automatic 12.00 82.51 45.75 .614 Teacher 9.69 77.65 41.16 .503 Parent .88 55.19 2.64 .090 Self .01 0.00 0.00 -.004 Peer 0.00 -NA- 0.00 -NA- Other .44 84.62 2.05 .115 Black (n = 275,821)
All sources 4.58 68.88 100.00 Automatic 2.30 82.25 60.05 .695
Teacher 1.96 56.47 35.10 .431 Parent .17 40.66 2.25 .090 Self <.01 66.67 <.01 .021 Peer <.01 25.00 <.01 .005 Other .14 58.24 2.52 .116 Hispanic (n = 59,398) All sources 3.34 70.08 100.00 Automatic 1.81 76.14 59.04 .664 Teacher 1.36 63.99 37.22 .479 Parent .08 50.00 1.73 .090 Self <.01 0.00 0.00 -.001 Peer 0.00 -NA- 0.00 -NA- Other .08 58.33 2.02 .105 Native (n = 984) All sources 12.30 83.47 100.00 Automatic 6.00 83.05 48.51 .606 Teacher 4.78 87.23 40.59 .568 Parent .71 71.43 4.95 .171 Self .10 100.00 .01 .094 Peer 0.00 -NA- 0.00 -NA- Other .71 71.43 4.95 .171 White (n = 333,569) All sources 14.65 83.90 100.00 Automatic 7.89 88.06 56.53 .675 Teacher 5.83 80.32 38.10 .516 Parent .61 64.47 3.19 .124 Self .01 40.00 <.01 .008 Peer <.01 55.56 <.01 .007 Other .31 84.91 2.14 .123 _____________________________________________________________________________
62
Table 2.5: Comparison of referral sources by race and SES ______________________________________________________________________________ Group Source Percentage Success Percentage Phi referred rate identified ______________________________________________________________________________ Asian (low SES, n = 5,611) All sources 12.53 75.39 100.00 Automatic 6.65 77.75 54.72 .623 Teacher 5.01 74.38 39.48 .510 Parent .43 50.00 2.26 .091 Self 0.00 -NA- 0.00 -NA- Peer 0.00 -NA- 0.00 -NA- Other .44 76.00 3.58 .152 Asian (high SES, n = 11,963) All sources 27.95 80.26 100.00 Automatic 14.51 83.53 54.02 .603 Teacher 11.90 78.23 41.51 .445 Parent 1.09 56.15 2.72 .085 Self <.01 0.00 0.00 -.005 Peer 0.00 -NA- 0.00 -NA- Other .44 88.68 1.75 .106 Black (low SES, n = 191,193) All sources 3.44 63.03 100.00 Automatic 1.54 78.97 56.08 .659 Teacher 1.69 50.62 38.48 .436 Parent .10 35.71 1.57 .071 Self <.01 60.00 .07 .020 Peer <.01 50.00 .02 .011 Other .11 54.50 2.77 .119 Black (high SES, n = 83,376) All sources 7.19 75.17 100.00 Automatic 4.02 85.12 63.39 .722 Teacher 2.60 65.11 31.33 .431 Parent .36 43.67 2.91 .102 Self .07 75.00 .07 .021 Peer <.01 0.00 0.00 -.001 Other .20 63.03 2.31 .114 Hispanic (low SES, n = 45,057) All sources 2.61 66.53 100.00 Automatic 1.32 72.51 54.92 .625 Teacher 1.17 62.00 41.89 .503 Parent .04 36.84 .89 .055 Self <.01 0.00 0.00 -.001 Peer 0.00 -NA- 0.00 -NA- Other .07 51.43 2.30 .106
63
Hispanic (high SES, n = 14,309) All sources 5.63 75.28 100.00 Automatic 3.38 80.58 64.36 .709 Teacher 1.95 67.74 31.19 .445 Parent .20 58.62 2.81 .122 Self 0.00 -NA- 0.00 -NA- Peer 0.00 -NA- 0.00 -NA- Other .09 76.92 1.65 .109 Native (low SES, n = 436) All sources 6.19 85.19 100.00 Automatic 2.06 88.89 34.78 .543 Teacher 3.67 81.25 56.52 .663 Parent 0.00 -NA- 0.00 -NA- Self 0.00 -NA- 0.00 -NA- Peer 0.00 -NA- 0.00 -NA- Other .46 100.00 8.70 .165 Native (high SES, n = 546) All sources 17.22 82.98 100.00 Automatic 9.16 82.00 52.56 .614 Teacher 5.68 90.32 35.90 .533 Parent 1.28 71.43 6.41 .186 Self .18 100.00 1.28 .105 Peer 0.00 -NA- 0.00 -NA- Other .92 78.38 3.85 .126 White (low SES, n = 97,527) All sources 5.55 78.89 100.00 Automatic 2.57 82.04 48.28 .617 Teacher 2.57 77.20 45.39 .579 Parent .22 58.33 2.95 .124 Self <.01 0.00 0.00 -.001 Peer 0.00 -NA- 0.00 -NA- Other .17 85.21 3.37 .165 White (high SES, n = 235,183) All sources 18.42 84.50 100.00 Automatic 10.08 88.69 57.42 .675 Teacher 7.19 80.76 37.30 .501 Parent .77 65.16 3.22 .120 Self .01 46.15 .03 .009 Peer <.01 55.56 .01 .007 Other .37 84.86 2.00 .116 ______________________________________________________________________________
64
Discussion
In consideration of the results presented in this paper, a few substantive and
methodological conclusions can be made. The first is that automatic referrals and teacher
referrals are far superior to the other referral sources. The other referral sources are used far less
often and are generally much less accurate. The peer- and self-referral options are so
infrequently used that they have almost no impact on gifted program enrollments.
Do these results provide evidence that the referral process is biased against economically
disadvantaged, Black, and Hispanic students? This is a complex issue. On the basis of numbers
alone, it is obvious that students from these traditionally underrepresented backgrounds are also
under nominated. The probability of nomination strongly varies across race and class
background. Furthermore, the accuracy of nomination sources also varies across backgrounds.
In general, nominations for low SES students are less accurate than nominations for high SES
students. Furthermore, nominations are less accurate for Black and Hispanic students than for
Asian, Native, and White students.
There are at least two plausible explanations for this pattern, depending on one’s beliefs
regarding the distribution of ability across race and class lines. If one adopts the position that
ability is evenly distributed across these lines, then these results can only indicate severe bias in
the nomination and testing procedure. Many readers will undoubtedly adopt this explanation for
the results presented in this paper. The low rate of automatic referrals could indicate bias in
standardized tests; the low rate of teacher nominations could indicate racism, classism, or
cultural ignorance on the part of teachers; and the low rate of parent nominations could indicate
that these students’ parents are alienated from and distrustful of school culture.
65
Interpreting these results in this light would lead to the conclusion that the nomination
process, rather than the screening process, is the primary cause of differential representation in
gifted programs. While it is certainly true that nominated students from “advantaged” groups
have a higher probability of successfully passing the screening process than “disadvantaged”
students, the effect of these differing “pass rates” is far smaller than the effect of the differing
nomination rates on the resulting gifted program enrollment. For example, 4.58 percent of Black
students had received a nomination with 68.9 percent of these successfully passing the screening
process, whereas 14.65 percent of White students received a nomination while 83.9 percent of
these successfully passing the screening. The pass rate for Black students is 82% that of the pass
rate for White students, whereas the nomination rate for Black students is only 31% the
nomination rate for White students. Equalizing the pass rates for Black and White students
would do little to restore proportional representation of these students in gifted programs if the
nomination rate remained unchanged.
Alternatively, if one believes that ability is not evenly distributed, then one can interpret
these results in a different light. The low rate of automatic referrals for certain groups reflects
lesser ability. When students from these groups are nominated, they are able to pass through the
screening less frequently. Teachers nominate fewer students from these groups because there are
simply fewer students from these groups that evidence advanced potential. Furthermore, the low
accuracy of teacher nomination for these students could reflect effort on the part of teachers to
address the long-standing inequality of gifted program enrollments by nominating students that
show even questionable potential to pass the screening process.
The true nature of ability distribution is currently unknown and is, perhaps, unknowable.
To answer this question definitely would require that most or all of the stakeholders agree upon
66
the nature, dimensionality, and meaning of ability, as well as the creation of instruments that
would be accepted by all as trustworthy, valid, and unbiased. This does not appear likely in the
near term. Therefore, the correct interpretation of these results is currently unknowable. Though
the previous discussion presented two possibilities to explain the observed results, it is also quite
possible that both are true. Ability may not be precisely evenly distributed across backgrounds,
but our currently methods for identifying gifted students may also be overlooking students
hailing from traditionally underrepresented backgrounds.
From a policy perspective, the results of this study indicate that more attention needs to
be devoted to the issue of student nominations for gifted programs. Georgia has very strong state
policies in support of gifted education, perhaps the strongest of any state in the nation. Georgia
is among the four states described by the Davidson Institute as having very strong policies and
funding for gifted education (Davidson Institute, 2006). Of these four states, Georgia has the
highest amount of gifted education funding per identified student (although funding levels were
not provided for Iowa and Florida). Georgia’s multiple criteria assessment procedure was
designed in part to help address the underrepresentation problem. The multitude of considered
referral sources speak to the state’s commitment to casting a wide net in search of talented
students. In spite of this commitment, Georgia continues to struggle with the
underrepresentation of minority and low-SES students in its gifted programs. It is unclear how
Georgia’s already flexible nomination policies could be improved without massively increasing
costs. One obvious issue is that the self and peer referrals are so infrequently used. Students
should be reminded that they may nominate themselves or other students for gifted program
assessment. Mandatory assessment of all students for gifted program placement would be
optimal but very expensive to implement.
67
From a methodological point of view, this study has important limitations. The most
pressing problem is that automatic referrals happen earlier than other referrals. This advantages
automatic referrals and inflates the quality indices associated with that referral source.
Therefore, the quality of referral sources cannot be directly compared. Indeed, the only real way
to compare referral sources would be to allow an individual student to receive multiple
nominations, so that a student could be nominated automatically as well as by her teacher and
peer. Recording only one referral course per student creates a “winner take all” system that
obscures the true value of each referral technique.
As teacher nominations have received the brunt of the attention in the literature, they
deserve further commentary. Though the quality of teacher nominations did fluctuate across
different student backgrounds, overall, the overall quality was quite high. The average phi
coefficient of .505 is almost twice as high as the phi value computed by Gagne’s (1994)
reanalysis of Pegnato and Birch’s (1959) classic study.
Gagne’s (1994) argument for the use of the phi coefficient was sound, but phi should not
completely supplant the other quality indices, especially the accuracy index (referred to as
“efficiency” in Pegnato & Birch, 1959). Although “other” referral sources are comparatively
rare and thus receive a low phi coefficient, they exhibit good accuracy.
The biggest strength of this study was the extremely large N. Because all of these
students came from the same state and fell under uniform policies mandated by the state
regarding gifted education, fluctuation due to policy shifts was minimized. However, the
applicability of these results to other states with differing gifted education policy is unknown.
68
References
Agresti, A. (1996). An introduction to categorical data analysis. New York, NY: John Wiley and
Sons.
Davidson Institute. (2006, March 2). Genius denied: How to stop wasting our brightest young
minds – Gifted education policies. Retrieved March 2, 2006 from
http://www.geniusdenied.com/Policies/StatePolicy.aspx?NavID=6_0
Feldman, D. H. (2003). A developmental, evolutionary perspective on giftedness. In J. Borland
(Ed.), Rethinking Gifted Education (pp. 9-33). New York, NY: Teachers College Press.
Ford, D. Y. (1998). The underrepresentation of minority students in gifted education: Problems
and promises in recruitment and retention. Journal of Special Education, 32(1), 4-14.
Frasier, M. M., Garcia, J. H., & Passow, A. H. (1995). A review of assessment issues in gifted
education and their implications for identifying gifted minority students. (No. RM95204):
The National Research Center on the Gifted and Talented.
Frasier, M. M., Hunsaker, S. L., Lee, J., Finley, V. S., Garcia, J. H., Martin, D., et al. (1995). An
exploratory study of the effectiveness of the Staff Development Model and the Research-
based Assessment Plan in improving the identification of gifted economically
disadvantaged students. Storrs, CT: University of Connecticut, National Research Center
on the Gifted and Talented.
Gagne, F. (1994). Are teachers really poor talent detectors? Comments on Pegnato and Birch's
(1959) study of the effectiveness and efficiency of various identification techniques.
Gifted Child Quarterly, 38(3), 124-126.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York, NY: Basic
Books.
69
Gear, G. H. (1976). Accuracy of teacher judgment in identifying intellectually gifted children: A
review of the literature. Gifted Child Quarterly, 20, 478-489.
Hunsaker, S. L. (1994). Adjustments to traditional procedures for identifying underserved
students: Successes and failures. Exceptional Children, 61(1), 72-76.
Hunsaker, S. L., Finley, V. S., & Frank, E. L. (1997). An analysis of teacher nominations and
student performance in gifted programs. Gifted Child Quarterly, 41(2), 19-24.
Kirschenbaum, R. J. (2004). Dynamic Assessment and Its Use With Underserved Gifted and
Talented Populations. In A. Y. Baldwin & S. M. Reiss (Eds.), Culturally diverse and
underserved populations of gifted students (pp. 49-62). Thousand Oaks, CA: Corwin
Press, Inc.
Masten, W. G., Plata, M., Wenglar, K., & Thedford, J. (1999). Acculturation and teacher ratings
of Hispanic and Anglo-American students. Roeper Review, 22(1), 64-65.
Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children
using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47(2), 155-
160.
Pegnato, C. W., & Birch, J. W. (1959). Locating gifted children in junior high schools: A
comparison of methods. Exceptional Children, 48, 300-304.
Plata, M., & Masten, W. G. (1998). Teacher ratings of Hispanic and Anglo students on a
behavior rating scale. Roeper Review, 21(2), 139-144.
Renzulli, J. S., & Delcourt, M. (1986). The legacy and logic of research on the identification of
gifted persons. Gifted Child Quarterly, 30, 20-33.
Renzulli, J. S., & Westberg, K. (1991). Scale for rating students' participation in the local gifted
education program. Storrs, CT.: University of Connecticut.
70
Renzulli, J. S., Smith, L. H., White, A. J., Callahan, C. M., & Hartman, R. K. (1976). Scales for
rating the behavioral characteristics of superior students. Mansfield Center, CT:
Creative Learning Press.
Sarouphim, K. M. (1999). DISCOVER: A promising alternative assessment for the identification
of gifted minorities. Gifted Child Quarterly, 43(4), 244-251.
VanTassel-Baska, J., Johnson, D., & Avery, L. D. (2002). Using performance tasks in the
identification of economically disadvantaged and minority gifted learners: Findings from
Project STAR. Gifted Child Quarterly, 46(2), 110-123.
Waters, T. J., & Clausen, S. (1983). Effectiveness of parent versus teacher nomination of gifted
children. Southern Psychologist, 1(4), 189-191.
71
CHAPTER 3
EXAMINING THE PROBABILITY OF IDENTIFICATION OF STUDENTS FOR GIFTED PROGRAMS IN GEORGIA ELEMENTARY SCHOOLS: A MULTILEVEL STRUCTURAL
EQUATION MODELING STUDY 2
2 McBee, M. To be submitted to Journal of Educational Psychology.
72
Abstract
The study focused on the analysis of a large-scale (n = 273,311) dataset collected by the
Georgia Department of Education using multilevel structural equation modeling to model the
probability that a student would be identified for participation in a gifted program. The model
examined individual- and organization-level factors that influence the probability that an
individual would be identified for participation in an advanced educational program. The
probability of being identified as gifted depended strongly on student race and socioeconomic
status. The mean probability of identification varied across schools. The model succeeded in
explaining 23 percent of the school-level variance in the probability of gifted identification and
70 percent of the variance in the school academic environment. The negative impact of having a
low-SES background on the probability of identification varied across schools as well. The
model explained 19 percent of this variance. The positive impact of being Asian varied across
schools as well. The model explained 90 percent of this variance. The impact of being Black or
Hispanic did not vary across schools.
73
The numerical underrepresentation of African-American, Hispanic, and Native American
students in gifted programs has been frequently cited in the gifted education literature (Ford,
1998; Reid, Romanoff, Algozzine, & Udall, 2000; Sarouphim, 1999; Scott, Perou, Urbano,
Hogan, 1992). The topic of underrepresentation is of critical importance to the field of gifted
education. It forces us to consider the possibility that a great number of students who are in need
of advanced educational opportunities are being denied this opportunity on the basis of racism or
classism. Programs that discriminate against minority students, either in fact or in appearance,
are in danger of elimination, as gifted programs in many states subsist on thin margins of
political will and public support.
The biggest issue related to understanding the disparity in gifted program enrollment
across racial groups is: to what extent does this disparity represent actual differences of
developed or potential capability across groups, and to what extent does it indicate the presence
of serious flaws in the methods by which we screen, identify, and serve gifted students? Most
scholars who have examined the issue have agreed with Frasier’s (1997) belief that “There is no
logical reason to expect that the number of minority students in gifted programs would not be
proportional to their representation in the general population” (p. 498). In spite of the critical
importance of this issue, it remains poorly understood. Though the underrepresentation is widely
noted, decried, and lamented in the literature, and a number of methods to increase the
representation of these groups have been proposed, very few published studies have adequately
addressed the complexity of the issue.
74
Individual-Level Factors in Underrepresentation
Almost all of the gifted education literature addressing the underrepresentation of
students in gifted programs has focused on two individual characteristics: being a member of a
minority group or having a low socioeconomic status (SES) background.
Underrepresentation of Ethnic Minority Students
Nomination problems. Some literature has focused on the role of nominations in the
underrepresentation problem. Most nominations come from teachers (Hunsaker, Finley, &
Frank, 1997). It is now widely believed that though the underlying dimensions of giftedness are
universal (e.g., Frasier & Passow, 1994, listed ten universal talents, abilities, and behaviors of
gifted children), the expression of these qualities may be heavily influenced by a child’s cultural
and economic background as well as the child’s immediate context (Peterson, 1999). Since
teachers in most schools are of White middle-class backgrounds, they may not consistently
recognize the signs of giftedness expressed in students of diverse cultural backgrounds. Thus,
part of the underrepresentation issue may be caused by unfair nomination procedures. There is
some support for this hypothesis in the research literature (e.g., Masten, Plata, Wenglar, &
Thedford, 1999; Plata & Masten, 1998).
Identification problems. Most gifted programs rely, at least to some degree, on
standardized measures of ability or achievement during the assessment process. It has widely
been noted that minority youth significantly under perform on these tests relative to their peers
(e.g., Entwisle & Alexander, 1992; Maker, 1996; Naglieri & Jensen, 1987). The performance
gap between Black and White students’ test scores tends to be about one standard deviation.
Many critics have argued that minority students tend to do poorly on such tests because the tests
themselves are flawed by being biased in favor of students from the dominant culture (Ford,
75
Harris, Tyson, & Trotman, 2002). In other words, standardized tests might unfairly penalize
minority students by assigning them lower scores for the same level of underlying ability or
achievement. The exact nature of this bias is unknown. However, empirical studies that have
performed differential item functioning (DIF) analyses to search for biased items have generally
failed to detect them (Jensen, 1980). Even more recent DIF analyses using sophisticated three-
parameter item response theory (IRT) models have failed to detect item bias (Gordon, 1987).
Because these models are sensitive to differences in difficulty, discrimination, and guessability,
they should detect any imaginable bias. However, critics of standardized testing continue to
argue against the use of these tests with minority students.
Mills and Tissot (1995) compared the scores of students from a wide variety of racial
backgrounds on several tests of mental ability. Black students scored about one standard
deviation below White students on the traditional tests, but the gap fell to about half a standard
deviation for Ravens Advanced Progressive Matrices, a completely non-verbal test of mental
ability. They concluded that verbally loaded tests penalize minority tests takers.
Gifted program persistence. Another challenge to equal representation is the issue of
students who choose not to participate in the gifted program due to cultural insensitivity
(Harmon, 2002) or peer pressure (Ford, 1998; Fordham & Ogbu, 1986). This evidence was
primarily gathered via case studies. Worrell, Szarko, and Gabelko (2001) conducted a larger
quantitative study of the issue and were unable to find evidence that race or socioeconomic status
played a role in student program dropout, though this study only examined dropout in a summer
enrichment program.
76
Low Socioeconomic Status
Students from low socioeconomic backgrounds are another group that has been widely
described as being underrepresented in gifted education programs. Literature in this area is less
voluminous than research on race. Descriptive data describing the degree to which low SES
students are underrepresented in gifted programs is hard to find. The issue is so widespread,
however, that it is taken as a truism.
There are, however, innumerable studies examining the impact of SES on school
achievement. Since school achievement is one dimension that is commonly assessed in gifted
program entrance, it is logical to assume that lower achievement may also lower the probability
of entrance into a gifted program. Low SES has consistently been found to powerfully reduce
student achievement. Ryan and French (1976) found that large differences existed in students
from low, middle, and high SES groups on achievement, as measured by the Iowa Test of Basic
Skills. Portes and MacLeod (1996) found that individual SES had a strong effect on Stanford
Achievement Test scores, even after controlling for a number of other individual variables.
Tyler-Wood and Carri (1993) found that the gap between low SES and average SES
students’ test scores was highest on the verbal subscales on many popular tests of mental ability.
These findings mirror those of Mills and Tissot (1995) with respect to the Ravens test. The
similarity of these findings may be caused by confounding between race and SES. Many
scholars in gifted education now advocate for using nonverbal instruments with minority, low
SES, or English as a Second Language (ESL) students (Naglieri & Ford, 2003).
One of the questions that has not been addressed in the gifted education literature is the
relative importance of race and SES in gifted program identification. McBee (in press) and
Portes and MacLeod (1996) found that race dropped out of their models when SES was
77
controlled. Many previous studies of race and gifted program admittance are seriously flawed
because race and SES are very highly related. Studies that have examined race without
controlling for SES either statistically or experimentally are confounded and thus impossible to
interpret.
Though there are not clear reasons why test scores should differ across racial lines, there
are perhaps good reasons why students from impoverished backgrounds would under perform.
Mills (1983) found a relatively small reduction in school readiness for Kindergarteners from low
SES backgrounds. Entwisle and Alexander (1992) and West (1985) found similar deficits in
initial readiness that became larger differences in achievement with each passing year. Entwisle
and Alexander attributed this to a “summer setback” phenomenon caused by relatively
unstimulating home environments. Quay (1989) found that cognitive development, as indicated
by performance on Piagetian conservation tasks, was significantly delayed for low SES students.
Students from low SES backgrounds may receive less cognitive scaffolding from their mothers
(Hess & McDevitt, 1984). By definition, low SES families lack resources. They are more likely
to live in substandard housing, have poor medical care, lack healthy foods, experience more
stress, live in high crime areas, and experience increased inter-family conflict (Duncan &
Brooks-Gunn, 2000).
School Factors that May Affect the Probability of Identification
School Socioeconomic Status
The results of a number of studies examining the impact of the SES composition of
schools on student achievement have consistently found that the socioeconomic composition of
the school exerts potent effects on educational outcome over and above the influence of
78
individual SES (Everson & Millsap, 2004; Kennedy, 1992; Maggi, Hertzman, Kohen, &
D'Angiulli, 2004; Opdenakker & Van Damme, 2001; and Taylor & Harris, 2003).
Everson and Millsap’s (2004) multilevel structural equation modeling study of SAT
performance found that school SES had very powerful direct and indirect effects on both SAT
math and SAT verbal scores. Kennedy (1992) analyzed the performances of Black and White
male third graders on a standardized achievement test. Results indicated that the school SES was
the strongest predictor of achievement at the school level for both the Black and White children.
However, the effect was approximately twice as strong for White students.
Maggi and colleagues (2004) studied the effect of organization-level SES on student
achievement. They found that neighborhood SES was strongly correlated with the proportion of
students within schools that were high achievers. The strength of this relationship increased
from fourth to seventh grade. Portes and MacLeod (1996) found a cross-level SES interaction by
using average school SES as a predictor of the slope coefficient relating individual SES to
mathematics achievement. This predictor was positive and significant such that the slope
relating individual SES to math achievement was steeper in high SES schools than the same
relationship in a low SES school. Students from high SES backgrounds had higher math
achievement when they were situated within high SES schools. Students from low SES
backgrounds were doubly penalized by attending high SES schools and performed better when
they attended low SES schools.
School racial composition. Taylor and Harris (2003) examined the effects of relative
integration and segregation on Black and White students’ Stanford 9 achievement test scores in
third, fifth, and eight grades. The achievement of Black students in eighth grade was strongly
negatively correlated with the percentage of the enrollment that is Black and even more strongly
79
with the percentage of the enrollment receiving free or reduced-price lunch (FRL), while it was
positively correlated with the percentage of students that are White. White students’
achievement was not significantly affected by the Black enrollment or the overall percentage of
students receiving free or reduced-price lunch. It was, however, negatively correlated with the
percentage of White students receiving subsidized lunch.
Purpose of the Study
In spite of years of research on and attention to the underrepresention of poor and
minority students in gifted programs, the problem remains poorly understood. The current study
has addressed several shortcomings of the previous literature examining the underrepresentation
of minority and low SES students in gifted programs. Previous work in gifted education has
focused almost exclusively on the individual-level effects of race and class on identification
outcome. These studies have typically confounded race with class and are thus uninterpretable.
Research Questions
This study addressed the following general research questions:
1. How are student race, socioeconomic status, days absent, and migrant status related to the
probability of being identified as gifted in Georgia elementary schools?
2. Does the general probability of gifted identification vary across schools? If so, does
school composition, academic quality, behavioral environment, and teacher
characteristics explain any of this variance in the probability of gifted identification?
3. Do the probabilities of identification vary across schools specifically for Black, Hispanic,
and Asian students, as well as students receiving free- or reduced-price lunch? If so, does
school composition, academic quality, behavioral environment, and teacher
characteristics explain any of this variance in the probability of gifted identification for
these students?
80
Method
Sample and Data Sources
A large dataset collected by the Georgia Department of Education was analyzed in this
study. It contains student-level data on every public school student in Georgia during the 2004
academic year (N = 1,780,591), as well as school-level data on behavioral incidents, teacher
ethnicity, training, and experience, and academic composition.
Data details and preparation. The individual-level data was comprised of student
ethnicity, lunch assistance status, grade, retention status, migrant status, and whether or not the
student had been identified as gifted either in a previous year or during the 2004 year. All of
these variables were categorical in nature. This data was aggregated for each school ID and
combined into a two-level dataset.
A school-level variable representing the academic composition of the students was
created by performing a principal components analysis of 15 variables representing the
percentage of students scoring as “advanced” on the CRCT in the subjects of math, English, and
reading for grades one through five. Initially this factor was included as a measurement model in
the within-schools mode in the SEM software. Once the within- and between-schools models
were combined into a multilevel SEM, the measurement model could no longer be estimated due
to computational limitations. Therefore, the factor was extracted from the 15 variables in the
SPSS software. A single factor was extracted on the basis of the scree plot and Velicer’s
minimum average partial test. The factor explained 78.4% of the variance in the 15 variables.
Its internal consistency reliability was 0.98. The factor loadings ranged from 0.812 to 0.923. The
“academic environment” variable consisted of the standardized factor scores from this principal
component.
81
Four variables describing the teaching staff for each school were included. Two variables
described the percentage of each school’s teaching staff that were Black or Hispanic. Variables
indicating the percentage of the school’s teaching staff that possessed advanced degrees
(Master’s, EdS, or PhD) and the average number of years of teaching experience for teachers at
each school were also included.
The data contained a set of 14 variables describing the incidence of severe behavior
problems within each school. These variables included the number of incidents of aggravated
battery, aggravated child molestation, aggravated sexual battery, aggravated sodomy, armed
robbery, arson, kidnapping, murder, rape, voluntary manslaughter, nonfelony drug possession,
felony drug possession, felony weapons possession, and terroristic threats. These counts were
added together for each school into a total number of incidents, which was then divided by the
total number of students in the school to yield an incident-to-student ratio.
From this overall dataset, all records of elementary school students were selected. This
resulted in a population (N = 686,375) of all Georgia elementary school students. From this
population, 50% of cases were randomly sampled in order to reduce the computational demands
of the estimation procedure. This resulted in a final sample of 341,634 students in 1,260 schools.
Cluster sizes ranged from 1 to 964, with only 16 clusters having sizes of 30 or smaller.
The data were reported to the Georgia DOE by individual personnel within each school.
The accuracy of the information reported by each school could not be verified. However,
schools are held accountable for the accuracy of the data they report to the state and federal
governments. The reporting system automatically generates error codes when out-of-range or
inconsistent data are reported. Variable descriptions may be found in Table 3.1. Descriptive
statistics may be found in Table 3.2, and variable intercorrelations may be found in Table 3.3.
82
Table 3.1: Variable descriptions ______________________________________________________________________________ Variable name Description Measurement scale
______________________________________________________________________________
Individual-Level Variables Black Race variable (dummy) Dichotomous Hispanic Race variable (dummy) Dichotomous Asian Race variable (dummy) Dichotomous Lunch SES proxy (0=paid, 1=reduced, 2=free) Trichotomous Retained Was student retained during current year? Dichotomous (0=no, 1=yes) Migrant Did student change schools during current Dichotomous year? (0=no, 1=yes)
School-level Variables Lunch Mean of within-school lunch variable Continuous % Students Black Percentage of enrollment that is Black Continuous % Students Hispanic Percentage of enrollment that is Hispanic Continuous % Students Asian Percentage of enrollment that is Asian Continuous % Teachers Adv. Percentage of teachers with advanced Continuous
degrees (beyond Bachelor’s) Years Tch. Exp. Average number of years of teaching Continuous experience % Teachers Black Percentage of teachers that are Black Continuous % Teachers Hispanic Percentage of teachers that are Hispanic Continuous Incid Student Ratio Ratio of severe behavioral incidents to number of students % GiftPrev Percentage of students previous identified Continuous gifted % Students Retain Percentage of students retained Continuous N Students / 100 Number of students enrolled divided by Continuous 100 Migrant Percentage of students changing schools Continuous Academic Comp. The first principal component extracted Continuous from 15 variables representing the
percentages of students classified as “advanced” on the CRCT in math, language arts, and reading in grades 1-5
Academic Env. A latent variable defined by Academic Continuous Comp. adjusting for its unreliability ______________________________________________________________________________
83
Table 3.2: Descriptive statistics
Individual-level categorical variables (n = 273,311) ______________________________________________________________________________ Variable Category % Category % Category % ______________________________________________________________________________ Asian Yes 2.9 No 97.1 Black Yes 41.8 No 58.2 Hispanic Yes 9.0 No 91.0 Retained Yes 2.5 No 97.5 Migrant Yes .01 No 99.99 Lunch Paid 50.1 Reduced 8.5 Free 41.1
Individual-level continuous variables (n = 980) ______________________________________________________________________________ Variable Mean SD Minimum Maximum ______________________________________________________________________________ Grade 2.01 1.42 1.00 5.00
School-level variables (n = 1262) Lunch 0.888 0.48 0.00 2.00 % Students Black 41.80 32.50 0.00 100.00 % Students Hispanic 9.24 13.10 0.00 94.68 % Students Asian 2.85 4.40 0.00 30.94 % Teachers Adv 46.71 11.68 3.13 90.31 Avg. Teach Exp 12.12 2.60 0.00 22.18 % Teachers Black 21.23 25.03 0.00 100.00 % Teachers Hispanic 0.71 1.57 0.00 14.61 Incident Student Ratio 0.04 0.18 0.00 4.03 % Students Gifted (prev) 4.05 4.26 0.00 78.68 % Students Retain 2.73 2.24 0.00 27.27 NStudents / 100 8.14 2.90 0.41 23.57 % Students Migrant 0.64 2.42 0.00 22.55 Academic Composite 0.02 1.01 -2.44 3.57 ______________________________________________________________________________ Note: When possible, the reported values are those computed after the listwise deletion of cases with missing variables.
84
Table 3.3: Variable Intercorrelations ______________________________________________________________________________
Acad. Comp Black Hispanic Asian Grade ______________________________________________________________________________ Acad. Comp 1.018 Black -0.378 0.243 Hispanic -0.093 -0.265 0.081 Asian 0.114 -0.145 -0.054 0.028 Grade -0.001 0.021 -0.032 -0.005 2.015 Migrant -0.047 -0.067 0.248 -0.012 -0.009 Lunch (sch) -0.782 0.417 0.105 -0.083 -0.003 % Asian 0.431 -0.171 0.111 0.266 -0.006 % Hispanic -0.207 -0.133 0.448 0.067 -0.019 % Black -0.572 0.659 -0.094 -0.069 0.009 % Retained -0.327 0.163 0.000 -0.039 -0.006 % Migrant -0.149 -0.086 0.162 -0.028 -0.008 % Gifted (prev) 0.672 -0.273 -0.036 0.121 0.005 Avg Tch Exp 0.089 -0.069 -0.082 -0.038 0.010 % Adv Tch 0.141 -0.147 -0.007 -0.016 0.003 Incid. Ratio -0.098 0.085 -0.011 -0.005 0.006 NStud / 100 0.141 -0.073 0.073 0.082 -0.004 % Tch Hisp -0.048 0.067 0.115 0.025 -0.004 % Tch Black -0.426 0.559 -0.081 -0.057 0.006 % Retained -0.058 0.046 0.025 -0.014 -0.044 Lunch -0.400 0.329 0.173 -0.063 0.001 Gifted 0.115 -0.086 -0.035 0.043 -0.006 ______________________________________________________________________________
Migrant Lunch (sch) % Asian % Hispanic % Black ______________________________________________________________________________ Migrant 0.006 Lunch (sch) 0.065 0.266 % Asian -0.032 -0.314 19.562 % Hispanic 0.112 0.237 0.251 170.232 % Black -0.041 0.631 -0.261 -0.208 1055.591 % Retained 0.013 0.378 -0.149 0.002 0.244 % Migrant 0.311 0.208 -0.105 0.352 -0.130 % Gifted (prev) -0.037 -0.582 0.457 -0.080 -0.413 Avg Tch Exp 0.027 -0.018 -0.147 -0.184 -0.104 % Adv Tch 0.011 -0.088 -0.062 -0.016 -0.223 Incid. Ratio -0.007 0.112 -0.021 -0.027 0.129 NStud / 100 -0.029 -0.192 0.306 0.161 -0.111
85
______________________________________________________________________________
Migrant Lunch (sch) % Asian % Hispanic % Black ______________________________________________________________________________ % Tch Hisp 0.002 0.123 0.090 0.262 0.100 % Tch Black -0.041 0.515 -0.215 -0.180 0.850 Retained 0.013 0.062 -0.024 0.000 0.046 Lunch 0.075 0.505 -0.163 0.115 0.323 Gifted -0.012 -0.087 0.060 -0.013 -0.064 ______________________________________________________________________________
% Retained % Migrant % Gifted Avg Tch Exp % Adv Tch ______________________________________________________________________________ % Retained 5.046 % Migrant 0.046 5.922 % Gifted (prev) -0.305 -0.121 18.191 Avg Tch Exp 0.088 0.090 0.023 6.730 % Adv Tch -0.039 0.036 0.084 0.529 136.276 Incid. Ratio 0.022 -0.024 -0.107 0.009 0.076 NStud / 100 -0.033 -0.093 0.136 -0.301 -0.205 % Tch Hisp -0.005 -0.003 0.014 -0.146 -0.058 % Tch Black 0.112 -0.132 -0.327 -0.138 -0.207 Retained 0.144 0.005 -0.049 0.010 -0.012 Lunch 0.192 0.104 -0.298 -0.006 -0.046 Gifted -0.039 -0.017 0.084 0.003 0.011 ______________________________________________________________________________ Incid. Ratio NStud / 100 % Tch Hisp % Tch Black ______________________________________________________________________________ Incid. Ratio 0.032 NStud / 100 -0.050 8.404 % Tch Hisp 0.082 0.002 2.437 % Tch Black 0.086 -0.150 0.164 628.122 Retained 0.004 -0.005 0.000 0.024 Lunch 0.056 -0.100 0.059 0.262 Gifted -0.020 0.013 -0.003 -0.049 ______________________________________________________________________________
86
______________________________________________________________________________ Retained Lunch Gifted ______________________________________________________________________________ Retained 0.025 Lunch 0.082 0.903 Gifted -0.029 -0.106 0.031 ______________________________________________________________________________
Note: Variances are displayed in italicized print on the diagonal.
Analysis
Multilevel structural equation modeling (SEM; Kaplan, 2000) was used as the primary
means of data analysis in this study. A multilevel approach was necessary for this study because
of the obvious nesting of students within schools and also because the relationships between
variables measured at the individual and school levels of analysis comprise the major research
questions. Because the researcher hypothesized a causal structure among the independent
variables and also envisioned the school academic composition as a latent variable, multilevel
SEM was selected over hierarchical linear modeling (McCoach, 2003).
Because the race variables were dummy coded, a separate variable for White students is
omitted. This demarcated White students as the reference group, so the model intercepts could
be interpreted as means for the White group. Similarly, to avoid estimation problems due to high
correlations between variables, the school-level student composition variables did not include a
variable referring to White students. Four of the school-level composition variables, the
percentages of the school enrollments that are Black, Hispanic, and Asian, as well as the school
mean for the “lunch” variable, were grand-mean centered. Additionally, the “grade” variable
was transformed by subtracting one from the actual grade so that the intercept probability of
87
identification estimated in the models would correspond to first grade students rather than
kindergarteners. The intercepts for the between-schools model and the average values for the
random slopes may thus be interpreted as their expected values in a compositionally average
school for White students in first grade.
Steps in model building
The analysis proceeded in seven steps and was conducted in MPlus for Windows version
3.13. Data preparation and principal components analysis was conducted in SPSS for Windows
version 12.0. The analysis was highly exploratory in nature. I was not interested in formally
testing several causal models against one another. Rather, I sought a reasonable model that
would a) exhibit acceptably good fit, b) remain simple enough for easy interpretability, c)
include only endogeneous variables of substantive interest, d) make theoretical sense, and e) be
possible to estimate on a powerful personal computer.
First, I attempted to create a viable single-level within-schools model, ignoring the
hierarchical structure of the data. Several models were considered during this phase. The final
model selected, described in Figure 3.1, met all the criteria. Because the endogeneous variables
in the model were categorical, the WLSMV estimator was utilized. Model fit was poor
according to the exact-fit test, � 2 (6) = 788.44, p < .001, but excellent according to the
approximate fit indices, CFI = .99, RMSEA = .02.
In the second step, I examined an identical model to that examined in step 1, except this
time the effect of clustering was examined using the TYPE = COMPLEX command in the
MPlus software. The model fit was substantially improved, � 2 (6) = 228.23, p < .001, CFI = .96,
RMSEA = .01. The lower CFI value was caused by a much lower � 2 value for the baseline
model as compared to the first step.
88
Race(Hispanic)
Race(Asian)
Lunch
Race(Black)
Migrant
Grade
Prob. Gifted Identification
Retained
Slope 1B
Slope 1A
Figure 3.1: Within-schools path model
In step three, a viable single-level between-schools model was created. Again, several
models were considered in this phase of the analysis. The final model selected, which is
described in Figure 3.2, also met the previously-mentioned criteria. Since all the endogeneous
variables in this model were continuous but nonnormally distributed, maximum likelihood
estimation with robust standard errors was selected. The model fit was excellent, � 2 (4) = 2.99, p
= .55, CFI = 1.00, RMSEA = .000. Though the measurement component of the “school
academic environment” latent variable could easily be estimated in the SEM software at this
stage of the analysis, attempting to do so in later stages was impossible due to computational
limits. Therefore, it was treated as a factor defined by the saved factor scores of the principal
89
component extracted from the 15 academic quality variables. Its error variance was fixed and
was calculated by multiplying the variance in the factor scores by its alpha reliability.
In step four, a multilevel structural equation model was created. The within portion of
the model was identical to that used in steps 1 and 2. The between portion of the model was
unconditional. This model can be thought of as an unconditional intercept model and is similar
to the random ANCOVA model frequently seen in hierarchical linear modeling (HLM) studies.
The purpose of this model was to examine the amount of level-2 variance in “the probability of
gifted identification” , the primary variable of interest. This is especially critical for multilevel
structural equation models with categorical endogenous variables because a within-level variance
is not estimated, which makes the calculation of the intra-class correlation coefficient (ICC)
impossible. This model was used to determine if “the probability of gifted identification” varied
randomly across schools – if it did not, a multilevel treatment is unjustified. The estimated
variance from this step was compared against the residual variance computed in the next step to
determine the amount of level-2 variance explained by the multilevel model. Full-information
maximum likelihood estimation was selected. Due to missing data, the sample size for this
model fell to n = 296,311 and the number of clusters fell to 1,057. Missing data was handled via
listwise deletion because the MPlus software cannot handle missing data in multilevel models.
Numerical integration was required since the endogenous variables were categorical. The
default standard trapezoidal integration was selected, with 15 integration points per dimension.
90
% StudentsBlack
% StudentsHispanic
Lunch
% StudentsAsian
% TeachersAdv Degree
Years Avg Teacher Exp
Prob. GiftedIdentification
% TeachersBlack
% TeachersHispanic
N Students/ 100
Incident-Student
Ratio
% Students Gifted (prev)
% Students Retained
% Students Migrant
AcademicEnvironment
AcademicComposite
1.0
Figure 3.2: Between-schools structural model
In this case, the only available types of model fit information are the logliklihood value,
Aikakie’s information criterion (AIC), and the Bayesian information criterion (BIC). The usual
fit indices such as chi-square, CFI, RMSEA, and SRMR, cannot be computed. In this case, the
91
logliklihood value was -316134.1 and the sample-size adjusted BIC was 632,447.2 with 19 free
parameters. The results indicated that the logit of the probability of gifted identification (mean =
-3.25, corresponding with a probability of .037) did vary significantly across schools with a
variance of 1.205 and a standard error of .071. The results confirmed the need for a multilevel
approach for modeling this data.
In step 5, the within- and between-schools models from steps 1 and 3 were combined into
a single multilevel structural equation model. The within-schools portion of this model may be
found in Figure 3.1 and the between-schools portion of this model may be found in Figure 3.2.
This model was similar to the intercept-as-outcome model commonly encountered in HLM
literature. The logliklihood value was –292,096.5 and the sample-size adjusted BIC was
584,585.2 with 42 free parameters. Due to missing data, the sample size dropped to 273,311
students in 980 clusters. The cluster sizes now ranged from 22 to 964.
In step 6, a set of four unconditional random slope models were estimated. The purpose
of these models was to determine whether the path values from “Black” , “Hispanic” , “Asian” , or
“Lunch” to “GRefCur" in the within-schools model varied randomly across schools. The results
of these models indicated that the slopes for “Lunch” and “Asian” did vary significantly across
schools while the slopes for “Black” and “Hispanic” did not. The “lunch” slope had a variance
of 0.058 and a standard error of 0.013. The “Asian” slope had a variance of 0.358 and a standard
error of 0.128
Since the variances of the unconditional slopes for “Black” and “Hispanic” were non-
significant, they were fixed in all further models. The results of step 6 indicated that the slopes
from “Lunch” and “Asian” to “the probability of gifted identification” in the within-schools
model (marked 1A and 1B, respectively, in Figure 3.1) should be treated as random.
92
In step 7, two separate random slope models were estimated, one for “Lunch” and one for
“Asian.” The structure of these models was identical except for the outcome variable of the
slope model. The within-schools portion of the model is described in Figure 3.1, the between-
schools portion in Figure 3.2, and the random slope models may be found in Figure 3.3. Note
that the components of this model that are redundant with the between-schools model, such as
the structure of the academics factor, have been omitted for clarity. Table 3.4 describes model fit
for the analysis steps.
Table 3.4: Model fit information for each analysis step
______________________________________________________________________________ Step Model � 2 (df) RMSEA CFI Free Log- N-adjusted
parameters likelihood BIC ______________________________________________________________________________ 1 Within 788.44 (6) .02 .99
2 Within (complex) 228.23 (6) .01 .96
3 Between 2.99 (4) .00 1.00
4 Unconditional intercept 19 -316134.1 623447.2
5 ML Intercept 42 -292096.5 584585.2
6 ML Search for Random Slopes -NA- -NA- -NA-
7a ML Lunch Slope 57 -292061.1 585654.6
7b ML Asian Slope 57 -292082.6 584697.6
______________________________________________________________________________
93
% StudentsBlack
% StudentsHispanic
Lunch
% StudentsAsian
% TeachersAdv Degree
Years Avg Teacher Exp
% TeachersBlack
% TeachersHispanic
N Students
Incident-Student
Ratio
% Students Gifted (prev)
% Students Retained
% Students Migrant
AcademicEnvironment
Random Slope
Figure 3.3: Random slope model for both “ lunch” and “ race (Asian)” to “ probability of being identified as gifted”
The optimal approach to examining multiple random slopes in multilevel analysis is to
consider all of them simultaneously in the same model. However, due to computational
limitations, this could not be done. I tried to reduce the computational demands of the procedure
94
by requesting MPlus’s Montecarlo integration option with 875 integration points, the maximum
number that could be supported. Montecarlo integration randomly selects a user-specified
number of integration points. I attempted to analyze the model with two random slopes and
Montecarlo integration five times, each time specifying a different seed for the random number,
in order to determine if the results would be stable from run to run. Unfortunately, the results
from this approach were not stable, so it was abandoned. Therefore, only one random slope
could be estimated at a time. It is possible that the values of some parameters as well as their
standard errors would have changed if both slopes could have been estimated together.
The total sample size for both models was n = 273,311 with 980 clusters which ranged in
size from 22 to 964. For the “Lunch” slope model, the logliklihood value was –292,061.05 while
the sample-size adjusted BIC was 584,654.48 with 57 free parameters. For the “Asian” slope
model, the logliklihood value was –292,082.6 and the sample-size adjusted BIC was 584,697.56
with 57 free parameters.
Interpreting model parameters
The majority of the endogeneous or outcome variables in this model were dichotomous.
For this reason, the parameters could not be interpreted in the usual way, that is, the expected
change in the outcome variable given a unit change in the explanatory variable, controlling for
the other variables in the model. When modeling dichotomous outcomes, the event of interest is
the probability that the individual will score “1” on the outcome variable, which is called a
“success.” Because probabilities are necessarily bounded at 0 and 1, they must be transformed to
an unbounded scale before they can be conveniently mathematically modeled.
95
One solution to this problem is the logit, which is defined as the natural log of the odds
ratio:
z = log p
p
−1
where z is the parameter expressed in logits and p is the probability of success. The parameters
are estimated as logits in many statistical models of dichotomous processes, including the models
presented in this paper. For example, the estimated intercept for the “probability of gifted
identification” in the final model in this study was -3.351. To transform this parameter back into
a probability, one must solve for p. The equation for transforming logits back into probabilities
is:
ze
p −+=
1
1
where z is the parameter value expressed in logits, p is the probability, and e is the base of
natural logarithms. Following this equation, it is easy to see that the estimated mean logit of -
3.351 corresponds with a probability of .034. Therefore, this is the probability of identification
for a “reference student” Georgia during the 2004 academic year. However, the interpretation of
this parameter is complicated by the multilevel aspect of the model. The value of –3.351 is the
intercept for the probability of gifted identification and is therefore the expected value when all
of the school-level variables have values of zero. This is obviously unrealistic, as few schools
have teachers with an average of zero years of experience, a student body size of zero, no
students receiving lunch aid, and so on. When allowing the school-level values to take their
mean values, the value of the logit for the parameter become –2.882, corresponding with an
estimated probability of gifted identification of .053 for a “reference student.”
96
Referring to the values reported for the within-schools model in Figure 3.4, we note that
the parameter value from “Black” to “probability of gifted identification” was -.934. To
calculate the effect that this variable has on the outcome probability, we first must know the
measurement scale of the explanatory variable. In this case, Black is a dummy variable coded 0
if the student is not Black and 1 if the student is Black. First the logits are summed, then the
resulting logit is converted back into a probability. The first step is identical to the process of
calculating predicted values from a regression equation.
022.01
1
816.3)1*934.0(351.3
)816.3(
10
=+
−=−+−+=
−−e
xzzz Black
Therefore, we see that being Black has an enormous negative impact on the probability of gifted
identification. Black students have less than half the probability of identification of a
comparable White student.
As a final example, the probability of gifted identification for a Hispanic student in third
grade who received free lunch and who was not retained or migrant will be calculated. Recall
that the lunch variable is scored from zero to two, with zero representing paid lunch, one
representing reduced-price lunch, and two representing free lunch.
005.1
1
308.5)3*020.()1*076.1()2*645.(882.2
)308.5(
3210
=+
−=−+−+−+−
+++=
−−e
xzxzxzzz gradeHispaniclunch
These results indicate that a third-grade Hispanic student receiving free lunch would have almost
no chance of being identified for participation in a gifted program.
Finally, note that the magnitude of the path coefficients cannot be used to directly
compare the impact of the various explanatory variables since many of these variables were
97
measured of different scales. Any variable whose name begins with “%” is a percentage ranging
from 0 to 100. The “lunch” variable is the average of the individual lunch variables for each
school and can range from zero to two. The academic environment factor had a mean of .456
and a standard deviation of .978. The incident-student ratio variable was a measure of the
number of severe behavioral incidents that occurred within each school divided by the total
number of students attending the school. Its mean value was .034 with a maximum of 4.03.
Model results
The path values and standard errors for the within-schools portion of the model may be
found in Figure 3.4, for the between-schools portion of the model in Figure 3.5, and for “Lunch”
and “Asian” slopes, respectively, in Figures 3.6 and 3.7. Note that the estimated values of the
between-schools model were slightly different in the “Asian” and “Lunch” models. When these
values for significant paths diverged by more than 1%, both sets of values were reported in the
figure.
Within-schools model. The results of the within-school model are described in Figure
3.4. Race had strong direct effects, such that Black or Hispanic students had a reduced
probability of gifted identification, even after controlling for socioeconomic status, while Asian
students had an increased the probability of identification. Furthermore, race had an
extraordinarily strong effect on the probability of receiving free or reduced-price lunch, which in
turn has a strong effect on the probability of retention. Grade had weak negative effects on both
the probability of retention and on the probability of gifted identification. Table 3.5 summarizes
the direct, indirect, and total effects of the within-schools model. Indirect effects for categorical
mediator variables were calculated based on a formula given by Winship and Mare (1983, p. 85).
The products of the path coefficients were weighted by the odds ratio of the intercept of the
98
intervening variable. Unfortunately, because no level-1 variance is estimated in multilevel
models with categorical outcomes, it is not possible to calculate the amount of variance in the
probability of gifted identification explained.
Race(Hispanic)
Race(Asian)
Lunch
Race(Black)
Migrant
Grade
Prob. Gifted Identification
Retained
1.712 (.002)
.563 (.010)
2.064 (.006)
.179 (.009)
.073 (.303)
-3.810 (.618)
-.645 (.177)-.503 (.015)
-.020 (.005)
-.020 (.005)
-.934 (.037)
-1.076 (.068)
1.161 (.512).179 (.009)
1A
1B
Figure 3.4: Path values and standard errors for within-schools portion of random slope model
Between-schools model. The between schools results are summarized in Figure 3.5. The model
explained 70 percent of the variance in the school academic environment through the school
composition. Student body SES had a very powerful impact on the academic environment, as
did the percentage of students within the school that had been previously identified as gifted.
99
Table 3.5: Model summary for within-schools component of random slope model
______________________________________________________________________________
Intercepts ______________________________________________________________________________
Variable Value Standard T-Value Probability Error ______________________________________________________________________________ Lunch (Reduced) -0.936 .001 785.46* .281 Lunch (Free) -1.361 .001 1011.94* .204 Retained -3.931 .020 201.28* .019 Gifted identification -3.351 .304 11.04* .034
Direct Effects ______________________________________________________________________________ From To Parameter Standard T-Value Value Error ______________________________________________________________________________ Black Lunch 1.712 .002 854.95* Hispanic Lunch 2.064 .006 355.30* Asian Lunch 0.179 .009 19.40* Lunch Retained 0.563 .010 55.25* Grade Retained -0.020 .005 -32.39* Lunch Gifted -0.645 .177 -3.65* Black Gifted -0.934 .037 -25.51* Hispanic Gifted -1.076 .068 -15.79* Asian Gifted 1.161 .512 2.27* Migrant Gifted 0.073 .303 -0.24 Grade Gifted -0.020 .005 -9.67* Retained Gifted -3.810 .618 -6.16*
Indirect Effects ______________________________________________________________________________ From To Through Value ______________________________________________________________________________ Black Gifted Lunch -0.429 Hispanic Gifted Lunch -0.518 Asian Gifted Lunch -0.045 Lunch Gifted Retained -0.044
100
Grade Gifted Retained 0.001 Black Gifted Lunch, Retained -0.028 Hispanic Gifted Lunch, Retained -0.033 Asian Gifted Lunch, Retained -0.003 Black Retained Lunch 0.375 Hispanic Retained Lunch 0.452 Asian Retained Lunch 0.039
Total Effects ______________________________________________________________________________ From To Value ______________________________________________________________________________ Black Gifted -1.391 Hispanic Gifted -1.627 Asian Gifted 1.113 Lunch Gifted -0.689 Grade Gifted -0.019 ______________________________________________________________________________ * Significant at or beyond p = .05
The racial composition of the student body also impacted the academic environment, with weak
negative effects for the percentages of the student body that were Black or Hispanic and a
stronger positive effect for the percentage of the student body that was Asian. It is quite
interesting that the variables measuring teacher education and experience did not have significant
effects on the school academic environment, nor did the ratio of severe behavioral incidents per
student or the percentage of students that had been retained.
The model explained 23 percent of the between-schools variance in the probability of
gifted identification. Only five paths had significant direct effects. The school SES composition
had a strong direct effect, such that students in schools with more low-SES students had a higher
probability of gifted identification. However, school SES also exerted a strong negative indirect
effect through school academic environment, overpowering the positive direct effect. The total
101
effect of school SES on the mean probability of identification was negative. Furthermore, the
percentages of the student body that were Black and Hispanic also exerted small positive effects.
The school academic environment had a large positive effect. Finally, the incident-to-student
ratio variable had a strong negative effect on the probability of gifted identification. Table 3.6
summarizes the direct, indirect, and total effects in the between-schools model. Values reported
here are from the “Asian” random slope model. In cases where significant path values or
standard errors were different in the “Lunch” slope model, the value is italicized and reported
beneath the value from the Asian slope model..
Random slope model for “ Lunch.” The results of the random slope model for the
“Lunch” slope (1A) are reported in Figure 3.6. The purpose of this model was to attempt to
explain the variance across schools in the probability that a student receiving free or reduced-
price lunch would be identified as gifted. The mean of this outcome was –.645 with a standard
deviation of .241. This parameter should be interpreted as a logit. The odds of a student
receiving reduced-price lunch being identified as gifted would be, on average, only 52% of the
odds of a comparable student who did not receive aid. The odds of identification for a student
receiving free lunch would be only 27.5% of the odds for a comparable student not receiving aid.
Only two explanatory variables in this model had significant effects. Nonetheless, the model
explained 19 percent of the variance in the parameter. The school academic environment had a
weak negative effect on the probability that a student receiving free or reduced-price lunch
would be identified. The percentage of students previously identified as gifted had a positive
effect on the probability of gifted identification. Table 3.7 summarizes the direct, indirect, and
total effects in this slope model.
102
% StudentsBlack
% StudentsHispanic
Lunch
% StudentsAsian
% TeachersAdv Degree
Years Avg Teacher Exp
Prob. GiftedIdentification
% TeachersBlack
% TeachersHispanic
N Students/ 100
Incident-Student
Ratio
% Students Gifted (prev)
% Students Retained
% Students Migrant
AcademicEnvironment
AcademicComposite
1.00 (0.0)
.286 (.012)
R2 = .70
-.830 (.054)
-.005 (.001)
-.013 (.002)
.045 (.006)
.003 (.002)
.013 (.008)R2 = .23R2 = .25
.905 (.067)
.881 (.066)
-.003 (.127)
.056 (.005)
-.014 (.008)
.562 (.069)
.615 (.072)
-.002 (.016).010 (.017)
-.013 (.019)
.006 (.012)
.005 (.026)
-1.407 (.399)-1.211 (.408)
-002 (.003)
-.007 (.017)
-.005 (.004)
.012 (.017)
.012 (.004)
.009 (.003)
.310 (.146)
.393 (.162)
.016
Figure 3.5: Path values for between-schools (intercept) component of random slope model
103
Table 3.6: Model summary for between-schools component of random slope model ______________________________________________________________________________
Intercepts ______________________________________________________________________________
Variable Value Standard T-Value Probability Error ______________________________________________________________________________ Academics 0.456 .114 3.99* Gifted identification -3.351 .304 11.04* .034
Direct Effects ______________________________________________________________________________ From To Parameter Standard T-Value Value Error ______________________________________________________________________________ Lunch Gifted .310 .146 2.13* % Black Gifted .009 .003 3.53* % Hispanic Gifted .012 .004 2.69* % Asian Gifted .012 .017 0.71 % Tch Adv Gifted -.005 .004 -1.47 Academic Env Gifted .562 .069 0.07 Avg Tch Exp Gifted -.007 .017 -0.42 % Tch Black Gifted -.002 .003 -0.58 % Tch Hisp. Gifted .005 .026 0.18 Incident Ratio Gifted -1.407 .399 -3.53* % Gifted Gifted .006 .012 0.55 % Retained Gifted -.013 .019 -0.71 NStudents/100 Gifted .010 .017 0.57 % Migrant Gifted -.002 .016 -0.15 Lunch Academic Env -.830 .054 -15.51* % Black Academic Env -.005 .001 -6.92* % Hispanic Academic Env -.013 .002 -5.98* % Asian Academic Env .045 .006 7.33* % Tch Adv Academic Env .003 .002 1.95 Avg Tch Exp Academic Env .013 .008 1.59 Incident Ratio Academic Env -.003 .127 -0.02 % Gifted Academic Env .056 .005 12.36* % Retained Academic Env -.014 .008 -1.74
104
Indirect Effects ______________________________________________________________________________
From To Through Value ______________________________________________________________________________ Black Gifted Academic Env -.002 Hispanic Gifted Academic Env -.007 Asian Gifted Academic Env .025 Lunch Gifted Academic Env -.466 % Gifted (prev) Gifted Academic Env .031
Total Effects ______________________________________________________________________________ From To Value ______________________________________________________________________________ Black Gifted .007 Hispanic Gifted .005 Lunch Gifted -.156 ______________________________________________________________________________ * Significant at or beyond p = .05
Random slope model for “ Asian.” The results of the random slope model for the “Asian”
slope (1B) are reported in Figure 3.7. Recall that it was necessary to model this slope separately
due to computational limitations. The mean of this outcome of 1.161 and its standard deviation
was .598. Therefore, Asian students have, on average, 319% greater odds of gifted identification
than comparable White students. The model explained 90 percent of the between-schools
variance in the probability of an Asian student being identified as gifted. Three variables had
statistically significant effects on the outcome. The school socioeconomic composition had a
strong effect such that Asian students in schools with large numbers of students receiving
subsidized lunch had a higher probability of being identified. The percentage of teachers in the
105
school that were Black had a weak negative effect. Finally, the size of the school student body
had a negative effect on the probability of gifted identification for Asian students. Table 3.8
summarizes the direct effects in this model.
Discussion
The results of this study have demonstrated that, in spite of relatively recent additions to
Georgia law changing the identification procedure for gifted programs to make it easier to
identify traditionally underrepresented students, a serious issue continues to exist in Georgia.
Though a previous study conducted by the author (using school-level data only) suggested that
the disparities in gifted program participation across racial groups might result only from
socioeconomic differences (McBee, in press), the results of the current study demonstrate that
this hopeful scenario is simply untrue. Even when socioeconomic status in controlled, race has a
huge impact on the probability of identification. Hispanic students remain the group with the
lowest probability of identification, though their probability is only slightly lower than the
probability for Black students. Asian students have a higher probability of identification than
students from any other group. Not only does being Black or Hispanic exert a large negative
effect on the probability of identification directly, it also exerts large indirect effects through
increasing the probability that the student will also be economically disadvantaged and therefore
receive the penalties associated with socioeconomic deprivation as well.
106
% StudentsBlack
% StudentsHispanic
Lunch
% StudentsAsian
% TeachersAdv Degree
Years Avg Teacher Exp
% TeachersBlack
% TeachersHispanic
N Students/ 100
Incident-Student
Ratio
% Students Gifted (prev)
% Students Retained
% Students Migrant
AcademicEnvironment
Random Slope 1A
(-.645)
.047 (.012)
-.099 (.043)
.017 (.008)
-.069 (.093)
.000 (.001)
.002 (.002)
.008 (.007)
.000 (.002)
-.002 (.011)
.003 (.002)
.013 (.014)
-.352 (.277)
-.001 (.010) .000 (.008) -.006 (.010)
R2 = .19
Figure 3.6: Path values for slope portion of random slope model 1A (from “ lunch” to “ probability of being identified gifted” )
107
Table 3.7: Model summary for “ Lunch” slope portion of random slope model _____________________________________________________________________________
Intercepts
______________________________________________________________________________
Variable Value Standard T-Value Odds Ratio Error ______________________________________________________________________________ Lunch Slope (Red) -0.645 .177 -3.65* .52 Lunch Slope (Free) -1.290 .177 -7.29* .28
Direct Effects ______________________________________________________________________________ From To Parameter Standard T-Value Value Error ______________________________________________________________________________ Lunch Slope 1A -0.069 .093 -0.75 % Black Slope 1A 0.000 .001 0.10 % Hispanic Slope 1A 0.002 .002 0.75 % Asian Slope 1A 0.008 .007 1.18 % Tch Adv Slope 1A 0.000 .002 -0.15 Academic Env Slope 1A -0.099 .043 -2.28* Avg Tch Exp Slope 1A -0.002 .011 -0.15 % Tch Black Slope 1A 0.003 .002 1.49 % Tch Hisp. Slope 1A 0.013 .014 0.93 Incident Ratio Slope 1A -0.352 .277 -1.27 % Gifted Slope 1A 0.017 .008 2.25* % Retained Slope 1A -0.001 .010 -0.14 NStudents/100 Slope 1A 0.000 .008 0.01 % Migrant Slope 1A -0.006 .010 -0.14
Indirect Effects ______________________________________________________________________________
From To Through Value ______________________________________________________________________________ % Gifted (prev) Slope 1A Academic Env -0.006
Total Effects ______________________________________________________________________________
108
From To Value ______________________________________________________________________________ % Gifted (prev) Slope 1A .011 ______________________________________________________________________________ * significant at or beyond p = .05
% StudentsBlack
% StudentsHispanic
Lunch
% StudentsAsian
% TeachersAdv Degree
Years Avg Teacher Exp
% TeachersBlack
% TeachersHispanic
N Students/ 100
Incident-Student
Ratio
% Students Gifted (prev)
% Students Retained
% Students Migrant
AcademicEnvironment
Random Slope 1B(1.161)
.036 (.063)
.062 (.133)
-.002 (.018)
.947 (.412)
-.005 (.005)
-.010 (.007)
.000 (.012)
-.009 (.006)
-.016 (.028)
-.014 (.006)
-.035 (.036)
.196 (.976)
-.014 (.044) -.043 (.021) .012 (.053)
R2 = .90
Figure 3.7: Path values for slope portion of random slope model 1B (from “ Asian” to “ probability of being identified gifted” )
109
Table 3.8: Model summary for “ Asian” slope portion of random slope model
______________________________________________________________________________
Intercepts
______________________________________________________________________________
Variable Value Standard T-Value Odds Ratio Error ______________________________________________________________________________ Asian Slope 1.161 .512 2.27 3.19
Direct Effects ______________________________________________________________________________ From To Parameter Standard T-Value Value Error ______________________________________________________________________________ Lunch Slope 1B 0.947 .412 2.30* % Black Slope 1B -0.005 .005 -0.90 % Hispanic Slope 1B -0.010 .007 -1.53 % Asian Slope 1B 0.000 .012 0.01 % Tch Adv Slope 1B -0.009 .006 -1.43 Academic Env Slope 1B 0.062 .133 0.47 Avg Tch Exp Slope 1B -0.016 .028 -0.56 % Tch Black Slope 1B -0.014 .006 -2.38* % Tch Hisp. Slope 1B -0.035 .036 -0.98 Incident Ratio Slope 1B 0.196 .976 0.20 % Gifted Slope 1B -0.002 .018 -0.10 % Retained Slope 1B -0.014 .044 -0.32 NStudents/100 Slope 1B -0.043 .021 -2.01* % Migrant Slope 1B 0.012 .053 0.23 ______________________________________________________________________________ * Significant at or beyond p = .05
Based on the mean values for the school composition, we can calculate the average probability
of identification for a White first grader who does not receive free or reduced-price lunch, was
not retained and not migrant (in other words, a “reference student” ) at 0.053. The same
110
probability for an equivalent Black student drops to 0.022. The same Black student who receives
free lunch would have an average probability of identification of 0.005, and 59.3 percent of the
Black students in Georgia receive free lunch. The situation for Hispanic students is even more
dire. A Hispanic first grader with no other special characteristics would have a probability of
identification of 0.019. A Hispanic first grader receiving free lunch would have an average
probability of identification of .005, with 66.7 percent of Georgia first graders receiving free
lunch. Table 3.9 describes model-implied probabilities for identification of students of various
backgrounds.
At the between-schools level, the race and socioeconomic composition actually had small
positive direct effects on the probability of gifted identification. However, the positive direct
effect of the SES composition is overshadowed by its larger indirect effect transmitted through
the school academic environment.
One striking finding was that fully seventy percent of the variance in the school academic
environment factor was explained via the school composition. It was surprising that the teacher
variables did not exert a significant influence on the academic environment, nor did the
percentage. It is particularly interesting given the current emphasis on teacher and school
accountability. It is also somewhat surprising that the percentage of students that had been
retained did not exert a significant negative effect on the school academic composition.
Consider that the variables that made up the academic environment composite were the
percentages of students scoring as “advanced” on the Criterion Referenced Competency Test
(CRCT). The results suggest that a certain population of students is likely to do well on the
CRCT under any circumstance and is therefore less sensitive to factors such as teacher quality.
111
Table 3.9: Model-implied probabilities of identification
______________________________________________________________________________ Student Student FRL Probability of Gifted Identification ______________________________________________________________________________ White Paid 5.3% White Reduced 2.9% White Free 1.5% Black Paid 2.2% Black Reduced 1.1% Black Free 0.6% Hispanic Paid 1.9% Hispanic Reduced 1.0% Hispanic Free 0.5% Asian Paid 15.1% Asian Reduced 8.6% Asian Free 4.7% ____________________________________________________________________________ Note: All probabilities assume that students are first graders in an compositionally average school for White students not receiving free or reduced-price lunch.
Perhaps this is related to the results of Reis, Westberg, Kulilowich, and Purcell’s (1998) study
which found that fifty percent of the curriculum could be eliminated for gifted students without
negatively impacting their achievement test scores. Perhaps the impact of teacher training and
experience would have been more powerful if the academic composite had been defined by
students scoring as “proficient” or “advanced” rather than just ”advanced” on the CRCT.
Another surprising finding was that the random slope coefficients for Black and Hispanic
in the within-schools model did not vary randomly across schools. When planning the study,
112
these slopes along with the Lunch slope were of primary interest because they represent the
students who are known to be underrepresented in gifted programs. It was hoped that this study
would help identify the types of schools that would be most effective at identifying Black and
Hispanic students. Unfortunately, it seems that all Georgia schools are equally ineffective at
identifying these two types of students.
Turning to the random slope model for Lunch, we noted that two paths had significant
effects. The path from academic environment was weakly negative, while the path from the
percentage of students previously identified as gifted was positive. Therefore, the positive direct
effect of the “previous gifted” variable was somewhat weakened by its indirect effect through the
school academic environment but remained positive. The positive direct effect was hypothesized
because it seemed likely that schools that were more effective at identifying gifted students in
general would also be better at identifying low-SES gifted students. The negative effect of the
academic environment, however, is contrary to what was expected. Perhaps schools with fewer
numbers of students excelling on standardized achievement tests creates more of an opportunity
for traditionally underrepresented students to stand out.
The Asian slope model was initially considered relatively uninteresting compared to the
other potential random slopes in the model and was only examined at all for reasons of
symmetry. This is because Asian students are known to be substantially overrepresented in
gifted programs (Kitano & DiJiosia, 2002). The results of the within-schools component of the
model confirm this finding. In fact, the degree of overrepresentation of Asian students is
actually larger than the degree of underrepresentation for Black or Hispanic students, though not
larger than the degree of underrepresentation of students receiving free lunch.
113
The results of this model are difficult to understand, especially in light of the small
amount of research that this issue has received. The SES composition of the school exerted a
strong effect on the probability of identification for Asian students, so that schools with more
students receiving aid identified more Asian students. School size exerted a weak negative
effect, as did the percentage of teachers in the school that were Black. This last effect was
included because it was hypothesized that diverse teachers would possess more multicultural
awareness and might therefore be more likely to nominate non-White students for evaluation for
gifted program entry. This did not appear to be the case.
It is possible that the effects in the Asian slope model actually represent composition
effects. There are relatively few Asian students in Georgia in general, with 12.6 percent of
schools having no Asian students and 50.5 percent of schools having less than one percent of
their student bodies composed of Asian students. Asian students would obviously have a small
chance of identification in schools that have zero or very few Asian students. This probably also
explains the relatively high standard error for the Asian slope parameter.
This study had several important limitations. First, on the methodological level,
computational limits became a major problem. The estimation of final slope models presented in
this paper took between six and twelve hours apiece on a powerful personal computer. The
models required a total of only 225 integration points. To estimate a model with both slopes
considered simultaneously would have required a total of 3,375 estimation points. The MPlus
software is not currently equipped to make use of some recent advancements in computer
hardware such as multiple processors, though the upcoming version of the software may have
this feature.
114
Second, missing data was a major problem. Listwise deletion resulted in the removal of
68,323 cases. There is some evidence that these cases were not missing at random. Upon
examining the descriptive statistics in Table 3.2, one will note that there is a large discrepancy
between the individual- and school-level versions of the “migrant” variable. This is because the
individual version was computed after listwise deletion, whereas the school version was
aggregated from the individual version before listwise deletion took place. The discrepancy
between the two indicates that the majority of migrant students had missing data and were
therefore removed from the analysis. It is quite easy to explain why this might be the case, since
it might take some time for a student’s records to be transferred to the new school. Nevertheless,
missing data that is not missing at random can bias the parameter estimates in any model (Enders
& Bandalos, 2001).
Finally, the dearth of model fit information for the multilevel models in this study is a
serious shortcoming as well. The logliklihood and BIC values reported are predominantly useful
for comparing the fits of competing models, not for making absolute judgments on whether or
not the model fits the data. The actual degree to which the models presented herein conform to
or diverge from the data remains unknown.
This study was the first to examine the issue of underrepresentation through categorical
data analysis tools, enabling the study to examine the actual probabilities of identification for
various student types. It is the first study of this type to use multilevel modeling techniques, and
it uses a dataset of unparalleled size. In spite of its limitations, it significantly extends our
knowledge of this issue.
115
References
Duncan, G. J., & Brooks-Gunn, J. (2000). Family poverty, welfare reform, and child
development. Child Development, 71, 188-196.
Enders, C. & Bandalos, D. (2001). The relative performance of full information maximum
likelihood estimation for missing data in structural equation models. Structural Equation
Modeling, 8(3), 430-457.
Entwisle, D. R., & Alexander, K. L. (1992). Summer setback: Race, poverty, school
composition, and mathematics achievement in the first two years of school. American
Sociological Review, 57(1), 72-84.
Everson, H. T., & Millsap, R. E. (2004). Beyond individual differences: Exploring school effects
on SAT scores. Educational Psychologist, 39(3), 157-172.
Ford, D. Y. (1998). The underrepresentation of minority students in gifted education: Problems
and promises in recruitment and retention. Journal of Special Education, 32(1), 4-14.
Ford, D. Y., Harris, J. J., III, Tyson, C. A., & Trotman, M. F. (2002). Beyond deficit thinking:
Providing access for gifted African American students. Roeper Review, 24(2), 52-58.
Fordham, S., & Ogbu, J. (1986). Black students' school success: The burden of "acting White."
The Urban Review, 18, 176-206.
Frasier, M. M. (1997). Gifted minority students: Reframing approaches to their identification and
education. In N. Colangelo & G. Davis (Eds.), The handbook of gifted education (2nd
ed., pp. 498-515). Needham Heights, MA: Allyn & Bacon.
Frasier, M. M., & Passow, A. H. (1994). Toward a new paradigm for identifying talent potential.
(No. 94112): The National Research Center on the Gifted and Talented.
116
Gordon, R. A. (1987). Jensen's contributions concerning test bias: A contextual view. In S.
Modgil & C. Modgil (Eds.), Arthur Jensen -- consensus and controversy (pp. 77-154).
Philadelphia, PA: Falmer.
Grantham, T. C., & Ford, D. Y. (2003). Beyond self-concept and self-esteem: Racial identity and
gifted African American students. High School Journal, 87(1), 18-29.
Harmon, D. (2002). They won't teach me: The voices of gifted African American inner-city
students. Roeper Review, 24(2), 68-75.
Hess, R., & McDevitt, T. (1984). Some cognitive consequences of maternal intervention
techniques: A longitudinal study. Child Development, 55, 1902-1912.
Hunsaker, S. L., Finley, V. S., & Frank, E. L. (1997). An analysis of teacher nominations and
student performance in gifted programs. Gifted Child Quarterly, 41(2), 19-24.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Thousand Oaks:
CA: Sage.
Kennedy, E. (1992). A multilevel study of elementary male Black students and White students.
Journal of Educational Research, 86(2), 105-110.
Kitano, M. K., & DiJiosia, M. (2002). Are Asian and Pacific Americans overrepresented in
programs for the gifted? Roeper Review, 24(2), 76-81.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New
York: Guilford.
Maggi, S., Hertzman, C., Kohen, D., & D'Angiulli, A. (2004). Effects of neighborhood
socioeconomic characteristics and class composition on highly competent children.
Journal of Educational Research, 98(2), 109-114.
117
Maker, C. J. (1996). Identification of gifted minority students: A national problem, needed
changes and a promising solution. Gifted Child Quarterly, 40(1), 41-50.
Masten, W. G., Plata, M., Wenglar, K., & Thedford, J. (1999). Acculturation and teacher ratings
of Hispanic and Anglo-American students. Roeper Review, 22(1), 64-65.
McBee, M. (In press). Minority representation in gifted programs: A school level analysis of race
and socioeconomics. Roeper Review.
McCoach, D. B. (2003). SEM isn't just the schoolwide enrichment model anymore: Structural
equation modeling (SEM) in gifted education. Journal for the Education of the Gifted,
27(1), 36-61.
Mills, B. C. (1983). The effects of socioeconomic status on young children's readiness for
school. Early Child Development & Care, 11(3), 267-273.
Mills, C. J., & Tissot, S. L. (1995). Identifying academic potential in students from under-
represented populations: Is using the Ravens Progressive Matrices a good idea? Gifted
Child Quarterly, 39(4), 209-217.
Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children
using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47(2), 155-
160.
Naglieri, J., & Jensen, A. R. (1987). Comparison of Black - White differences on the WISC-R
and the K-ABC: Spearman's hypothesis. Intelligence, 11, 21-43.
Opdenakker, M.-C., & Van Damme, J. (2001). Relationship between school composition and
characteristics of school process and their effect on mathematics achievement. British
Educational Research Journal, 27(4), 407-432.
118
Peterson, J. S. (1999). Gifted--through whose cultural lens? An application of the postpositivistic
mode of inquiry. Journal for the Education of the Gifted, 22(4), 354-383.
Plata, M., & Masten, W. G. (1998). Teacher ratings of Hispanic and Anglo students on a
behavior rating scale. Roeper Review, 21(2), 139-144.
Portes, A., & MacLeod, D. (1996). Educational progress of children of immigrants: The roles of
class, ethnicity, and school context. Sociology of Education, 69(4), 255-275.
Quay, L. C. (1989). Interactions of stimulus materials, age, and SES in the assessment of
cognitive abilities. Journal of Applied Developmental Psychology, 10(3), 401-409.
Reid, C., Romanoff, B., Algozzine, B., & Udall, A. (2000). An evaluation of alternative
screening procedures. Journal for the Education of the Gifted, 23(4), 379-396.
Ryan, J. J., & French, J. R. (1976). Long-term grade predictions for intelligence and achievement
tests in schools of differing socio-economic levels. Educational & Psychological
Measurement, 36(2), 553-559.
Sarouphim, K. M. (1999). Discover: A promising alternative assessment for the identification of
gifted minorities. Gifted Child Quarterly, 43(4), 244-251.
Scott, M. S., Perou, R., Urbano, R. C., & Hogan, A. (1992). The identification of giftedness: A
comparison of White, Hispanic and Black families. Gifted Child Quarterly, 36(3), 131-
139.
Taylor, S. A., & Harris, K. C. (2003). School integration and the achievement test scores of black
and white students in Savannah, Georgia. North American Journal of Psychology, 5(2),
301-309.
Tyler-Wood, T., & Carri, L. (1993). Verbal measures of cognitive ability: The gifted low SES
student's albatross. Roeper Review, 16(2), 102-106.
119
West, C. A. (1985). Effects of school climate and school social structure on student academic
achievement in selected urban elementary schools. Journal of Negro Education, 54(3),
451-461.
Wilson, N. S. (1986). Counselor interventions with low-achieving and underachieving
elementary, middle, and high school students: A review of the literature. Journal of
Counseling & Development, 64(10), 628-634.
Winship, C. & Mare, R. D. (1983). Structural equations and path analysis for discrete data. The
American Journal of Sociology, 89(1), 54-110.
Worrell, F. C., Szarko, J. E., & Gabelko, N. H. (2001). Multi-year persistence of nontraditional
students in an academic talent development program. Journal of Secondary Gifted
Education, 12(2), 80-89.
120
CHAPTER 4
MULTILEVEL ANALYSIS IN GIFTED EDUCATION
121
Abstract
Multilevel data analysis techniques have become very important in educational research.
This paper introduces two varieties of multilevel analysis, hierarchical linear models and
multilevel structural equation models, to the gifted education community. Readers are assumed
to have knowledge of multiple regression techniques. The rationale and purpose of multilevel
analysis is explained. A sample dataset is analyzed according to a variety of multilevel analysis
strategies. Example code for conducting multilevel analyses via the MPlus software package is
included.
122
Multilevel analysis is a type of data analysis that is currently receiving a great deal of
interest and enthusiasm in many social science disciplines. The past several years have
witnessed the increased rate of offering of courses in multilevel analysis in graduate programs in
education and intensive workshops. A number of fairly accessible resources in multilevel
analysis are now available, and both the number and the ease-of-use of multilevel analysis
software packages have increased. Though the number of publications using multilevel
modeling in education has been steadily increasing, the use of such techniques in gifted
education research is currently very limited. This article was written to introduce multilevel
analysis to researchers. It is intended to introduce two varieties of multilevel analysis,
hierarchical linear modeling (Schreiber & Griffin, 2004) and multilevel structural equation
modeling (Heck, 2001), in a straightforward manner. While some equations must be presented
for understanding multilevel analysis, I have attempted to minimize the mathematical treatment
in favor of conceptual clarity. Understanding this article will require the reader to be familiar
with statistical analysis methods such as analysis of variance and multiple linear regression.
Readers who possess an understanding of structural equation modeling will find the section on
multilevel SEM quite easy to understand. Readers who are unfamiliar with SEM are referred to
McCoach’s (2003) beginner-friendly introduction.
What is Multilevel Analysis?
It will be useful to examine two concepts common to multilevel analysis before delving
deeper into this discussion. These two concepts are data clustering and levels of analysis.
Multilevel analysis approaches are appropriate and useful when data is clustered or has a
hierarchical structure. Clustered data occurs when research participants are grouped in some
way (Hox, 2002). In educational research, students are often situated within classrooms.
123
Students who are in classrooms together are likely to be more similar than students in different
classrooms for a variety of reasons. If one student gets sick, he could cause several other
classmates to get sick. If one student asks an insightful question, all the other students in that
class benefit from hearing the answer. In practice, the vast majority of data collected in
education is clustered. Higher levels of clustering are possible as well. Students attending the
same school may be more similar than students attending different schools due to differences in
funding, school leadership, and location. Students attending schools in the same district may be
more similar than students across districts due to differences in policy. When ordinary statistical
methods are applied to clustered data, the results may be incorrect or misleading (Raudenbush &
Bryk, 2002).
In traditional data analysis, the researcher is interested in measuring variables, forming
hypotheses, testing models, and making inferences concerned with a single level of analysis.
This level of analysis is usually individual students. For example, we might be interested in the
impact of a student’s verbal ability on performance in the gifted education classroom. Less
commonly, we may choose schools as our level of analysis. In that case, our research questions
would concern how the properties of schools affect school-based outcomes. We might ask how a
school’s per-pupil funding affects the school’s achievement test scores. Frequently, but not
always, the information examined at the school level is actually comprised of aggregated
individual-level data. Schools themselves do not have achievement test scores, but the students
attending the schools do. The mean of these scores may be taken to represent the overall level of
academic achievement of students attending the school. Multilevel analysis differs from
traditional data analysis because it is addresses research questions targeted at different levels of
analysis simultaneously (Hox, 2002). For example, multilevel analysis would allow us to extend
124
the previous example of the effect of verbal ability on gifted education class performance by also
examining the impact of the verbal abilities of the class on the student’s performance. In fact,
we can envision the variance in student performance as having two parts: one part due to the
student’s individual ability, and one part due to the effect of the ability of the other students by
which he is surrounded. Most gifted education professionals believe that grouping gifted
students together is beneficial. Multilevel analysis allows us to directly address research
questions involving the effects of grouping while simultaneously considering (or adjusting for) a
student’s individual ability. Data clustering generally poses statistical problems, while choosing
a level of analysis is primarily a conceptual issue.
Statistical Issues in Applying Unilevel Statistical Methods to Clustered Data
In practice, almost every dataset analyzed in educational research is comprised of
clustered data because our research designs usually involve the collection of data from a few
hundred students distributed across several classrooms. Our datasets are commonly analyzed
with t-tests, analysis of variance (ANOVA), or multiple linear regression techniques. We are
aware that the accuracy of the results obtained from these techniques is predicated on a number
of assumptions, which may or may not apply to our data. In order to facilitate further discussion,
the basic assumptions of multiple linear regression will be enumerated. Of the three basic
statistical analysis techniques described above, regression is the most flexible. The t-test can be
conceptualized as a special case of ANOVA, and ANOVA can be thought of as a special case of
regression (Pedhazur, 1997). The major assumptions of regression are as follows:
1) The underlying relationship between the predictor variables and the outcome is linear.
2) The residuals (i.e., the prediction errors, Y�
- Y) are assumed to be normally distributed.
3) Residuals have a constant variance (homoscedasticity)
125
4) Observations are independent (Raudenbush & Bryk, 2002).
Speaking pragmatically, we often proceed with our analysis even when we have evidence that
one or more of these have been violated. In gifted education research, for example, the scores of
our participants on measures of intellectual ability, creativity, or academic achievement are often
extremely skewed. Indeed, most theoretical conceptions of giftedness require that the students
we label “gifted” fall at the top end of one or more assessments. When examining published
gifted education research that has employed linear regression, attempts to confirm whether or not
these assumptions have been violated appear to be quite rare.
Clustered data violates the assumption of independent observations (Curran, 2003). The
assumption of independence means that the participants in the study do not affect one another,
and that variables are only related to each other in the ways specified in the statistical model that
was used. When this assumption is violated, incorrect estimates and standard errors of the model
parameters may result. In regression research, a t-test is associated with each independent
variable in the model. The magnitude of the independent variable’s effect is divided by its
standard error to produce the value of the t-test, which is used to determine the likelihood that the
effect of that independent variable is statistically significant, (i.e., unlikely to have occurred by
chance). Because units in clustered datasets are usually more alike than they would be if the
units were independent, standard errors of the model parameters may be depressed. Depressing
the standard errors causes the overall t-ratio to become larger than it should be, increasing the
probability that the researcher will reject the null hypothesis when it is correct. In other words,
using traditional statistical approaches with clustered data may increase the risk of Type 1 error.
The situation may become even more severe when researchers attempt to mix variables
measured at different levels of analysis in a unilevel statistical model. This practice was
126
common before the widespread adoption of multilevel statistical methods. Imagine a research
study wherein a researcher entered both students’ individual abilities as well as the average
ability of the students’ classmates as independent variables into a standard regression model. In
this case, each student will have a unique ability score, but all students in a classroom will have
identical classroom ability scores. The standard errors of both parameters are likely to be
affected by the data clustering, but the standard errors for the classroom ability variable will be
extremely depressed. Scullen (1997) pointed out that when ordinary Pearson correlation
coefficients are computed between variables measured at different levels of analysis, the
magnitude of the correlation coefficient may be extremely inflated by as much as 70%.
One solution to this issue that does not violate the assumptions of traditional statistics is
to simply not conduct studies where clustering is a problem by limiting the level of analysis to
schools or classrooms instead of individuals. However, we are generally most interested in what
happens to students as individuals. Furthermore, to only consider classroom-level aggregate
indicators of student learning is to throw away a large amount of data, which reduces the
statistical power or sensitivity of the analysis. Finally, this approach is not without potential
hazards. There is some risk in attempting to make conclusions about individuals based on
aggregated data obtained from a higher level of analysis. The ecological fallacy occurs when
group-level relationships are erroneously presumed to apply to the individuals comprising those
groups (Robinson, 1950). Variables can exert very different effects through different
mechanisms depending on which level of analysis is used. Examining only one level of analysis
may lead the researcher to make incorrect conclusions under these circumstances. This is known
as aggregation bias (James, 1982; Walker & Catrambone, 1993). For instance, several studies of
the big-fish-little-pond effect (BFLPE; Marsh & Parker, 1984) have shown that though
127
individual student ability has a positive relationship with academic self-concept, the aggregate
ability of the classmates has a negative impact on self-concept (Marsh, 1987; Bachman &
O'Malley, 1986). This is understood through “frame of reference theory” which posits that
individuals make use of both objective data and social comparisons when making self-
judgments. Students that are high in ability will have received some objective evidence to this
end, which results in the high positive correlation between ability and academic self-concept.
Students that are situated in contexts where the average ability of their classmates is also high
will be comparing themselves against a tough standard and will therefore have a lower academic
self-concept than the same student would if surrounded by peers of lower ability. Studies
focusing exclusively on either the individual or aggregate levels of analysis would fail to detect
this interesting relationship. Studies focusing only on the school level would find the effects of
the individual-level relationship between ability and self-concept to be weaker than it actually is
because the social comparison aspect would be unaccounted for in the model to the extent that
bright students are grouped together. The same would be true if the study were conducted
exclusively at the individual level. The positive contribution of individual ability would be
masked by the negative effect of the context.
Introduction to Multilevel Analysis
Multilevel analysis refers to a family of data analysis techniques that are appropriate to
use with clustered data. There are two major variants of multilevel analysis: hierarchical linear
modeling and multilevel structural equation modeling. These two techniques were developed
separately but have much in common. To make things more confusing, a variety of names are
used in the literature for each technique. Hierarchical linear modeling is also known as
multilevel regression or simply multilevel modeling (Raudenbush & Bryk, 2002). Structural
128
equation modeling is also known as covariance structure analysis or causal modeling (McCoach,
2003). Structural equation models that do not include latent variables are called path models.
Conceptually speaking, hierarchical linear modeling may be thought of as a special case of
multilevel structural equation modeling. For the remainder of the paper, hierarchical linear
models will be abbreviated as HLMs, and multilevel structural equation models will be
abbreviated as ML-SEMs.
There are a number of types of multilevel models to address different research questions.
Perhaps the simplest type of multilevel analysis is very similar to ordinary regression, only the
analysis is able to correctly handle clustered data. Other types of multilevel models explicitly
address research questions directed at multiple levels of analysis simultaneously. These models
will be addressed in more detail later in this paper. Furthermore, most types of multilevel
modeling require several steps. Finally, variants of multilevel models are able to correctly
handle many types of categorical or non-normal data.
A variety of software packages are available for conducting multilevel analyses. These
include SAS, LISREL, HLM, MLWin, MPlus, and several others. For consistency, Mplus
version 3.2 will be used for all analyses presented herein. Mplus is one of the most flexible
statistical modeling software packages currently available.
The basic idea in HLMs is that the standard regression equation is extended to correctly
handle clustered data as well as research questions that address multiple levels of analysis (Hox,
2002). The basic regression equation is:
(1) iii exBBY ++= 10
The Y is the individual’s score on the outcome variable while X is the individual’s score on the
explanatory variable. The inclusion of the subscript i indicates that each individual in the dataset
129
receives a separate value of the parameter. The B0 and B1 terms are the intercept and slope
parameters, respectively, and are the same for all individuals in the dataset. The e term is the
error or residual term. The value of B0, the intercept, is interpreted as the mean or expected
value of Y for an individual with an X of zero. As an example, let us imagine that we have
constructed a regression equation explaining test performance in terms of IQ score. Our
regression equation is:
(2) iii eIQBBTEST ++= )(10 where B0 = 10.2 and B1 = .60
Note the value of the intercept is 10.2, which is the expected test score for a student with an IQ
of zero. Since no students in our dataset have IQs of zero, the value of this intercept term is not
very meaningful. We can make the intercept meaningful by centering our X variable, or
rescaling it. If the average IQ score in our dataset is 100, we can subtract this mean from all the
IQ scores in our dataset to center the scores about the mean. Now a score of zero on the X
variable represents an IQ of 100. The value of the slope coefficient B is unaffected by the
centering, but the value of B0 is now the expected test score of an individual with an IQ of 100,
or in our example, 80.2. The B1 value of .60 represents the relationship between IQ score and
test performance. All regression programs automatically perform a significance test of the slope
coefficients to determine if the relationships they represent are significantly different from zero.
For the purposes of our example, imagine that our slope is highly significant.
Now imagine that our data were collected from two schools of different quality. In the
first school, the mean test score for a student with an IQ of 100 was 70.2. In the second school,
the mean test score for an IQ of 100 was 90.2. In other words, a student with an average IQ is
likely to score much higher on the test if he attends the second school. The standard regression
model presented estimates a single B0 which is assumed to apply to all the students in the dataset.
130
In multilevel modeling terms, the coefficients in a standard regression equation are fixed, that is,
a single coefficient is estimated for all units in the study. Clearly, this is inappropriate in our
dataset. Our regression equation is now likely to substantially overpredict the test scores for
students attending the first school while underpredicting test scores for students attending the
second school. This prediction error is captured by the residual, e. Residuals in the standard
regression model are assumed to be normally distributed, but our residuals will be bimodally
distributed. What we need is the ability to estimate a separate B0 for each school. This is
precisely what multilevel modeling allows. Instead of a single intercept coefficient being
estimated that is presumed to apply equally to all schools, separate coefficients are estimated for
each school. We can augment our original regression equation with additional equation that
predicts the value of B0, the intercept term in the original equation. This scenario can be
formalized as follows:
(3) Level 1: ijijiij eIQBBTEST ++= )(10
Level 2: jjB 0000 µγ +=
Note that several parameters now have two subscripts. The subscript i indicates that the
parameter will have a unique value for each individual in the dataset. The subscript j indicates
that the parameter will have a unique value for each school in the dataset. We can refer to the
individual members of our dataset by labeling them the ith student in the jth school. In general,
it is fair to say that subscripts become much more important as well as much more confusing in
multilevel models. Our level two equation explains the value of the intercept in the level-1
model in terms of a level-2 variable.
Let us further explore the level-2 equation. The γ00 term represents the expected value of
the level-1 intercept B0j. It is the mean of all the school intercepts, or the mean of all the school
131
test score means, which is 80.2 in our example. The term �0j is the level-2 error term. In our
contrived example with only two schools, �01 (the level-2 residual for the first school) would be –
10.0 while �02 would be 10.0. In more realistic models, many clusters or level-2 units (schools,
in this example) would be sampled. In this case, we would expect each school’s mean test score
to be normally distributed about γ00. The inclusion of the level-2 residual �0j is what allows each
school to take on a unique intercept B0j. In multilevel modeling terminology, we say that the
intercept is random, or allowed to randomly vary about γ00. Without it, the intercept is fixed at
γ00. We are interested in the variance in �0j for another reason. If there is very little variance in
�0j, this means that all schools have very similar intercepts. In this case, the results of a
multilevel analysis will be very similar to the results of a traditional unilevel analysis. The added
complexity of a multilevel analysis may not be justified under such conditions. Finally, because
no explanatory variables were included in the level-2 model, we say that it is unconditional.
Another concept that is frequently encountered in multilevel analysis is the intra-class
correlation coefficient (ICC, Raudenbush & Bryk, 2002). The intraclass coefficient is calculated
as the amount of variance in a variable that is at level-2 (i.e, between clusters) divided by the
total variance in the variable at both level-1 and level-2. The ICC can range from zero to one
and indicates the percentage of variance in a variable that occurs across clusters. For this reason,
the ICC is often called the cluster effect. It is automatically calculated by most multilevel
modeling software packages. When the ICC is small, the results of a multilevel analysis will not
be very different from the results of a unilevel analysis. When the ICC is large, the parameter
values and standard errors estimated for a multilevel analysis are likely to be very different from
the values estimated by a unilevel analysis. The ICC is commonly used to determine whether the
added complexity and difficulty of a multilevel analysis is justified.
132
We have now examined a basic HLM. This example showed how a simple single-level
regression model may be extended to correctly model clustered data.3 This is a useful
application of multilevel modeling. However, the real power of multilevel modeling lies in its
power to address research questions that concern more than one level of analysis. The model we
examined earlier is able to address a research question focused only at the individual level. We
can easily extend the previous model to address true multilevel research questions in a
straightforward way.
Suppose we returned to our original dataset, which contained three variables: the school
ID code, the student’s IQ, and the student’s test score. From our data, we could easily calculate
another variable – the mean IQ score for each school. We determined that an individual
student’s IQ is positively related to that individual’s performance on the test. We also
determined that the average test score in the second school is much higher than the average test
score in the first school. We might naturally be curious about the difference between these
schools. Perhaps the average IQ of students in the second school is higher than the average IQ of
students in the first school. If this is true, it might explain some of the difference in the two
school’s test performances. We can address this research question by entering our new
SCHOOL_IQ variable into our level-2 model as follows:
(4) Level 1: ijijiij eIQBBTEST ++= )(10
Level 2: jjj IQSCHOOLB 001000 )_( µγγ ++=
Adding the explanatory variable to the level-2 model changed the interpretation of γ00. Now,
instead of representing the overall grand-mean test score, it represents the expected mean test
3 Strictly speaking, to produce completely unbiased estimates, the level-1 explanatory variables should be group-mean centered rather than grand-mean centered. See Raudenbush & Bryk (2002), p.135-139. Group-mean centering means that mean score for the variable within each level-2 unit is subtracted from all the scores of the individuals within that unit. The centering of level-1 variables is a complex issue that is beyond the scope of this manuscript.
133
score for a school with a mean IQ of zero. Again, this is unrealistic. We may center the mean
school IQ scores to facilitate the interpretation of γ00. After centering, γ00 would be interpreted as
the expected mean school test score for schools with an average school mean IQ.
The value of γ01 is interpreted exactly like the value of a slope coefficient in a standard
regression. It may be statistically tested to determine if the relationship it represents is unlikely
to result from chance. If it is significant, then we have explained some of the differences in
mean test scores between schools in terms of the ability composition of the two schools’
students. By adding a variable to the level-2 model, we have made it conditional. The variance
in the residual �0j will be decreased when a significant explanatory variable is added to the model
as compared to the variance in �0j as calculated in the unconditional model. By estimating the
unconditional model and the conditional model in separate steps and recording the value of �0j
from each step, the percentage of level-2 variance explained may be easily calculated. Figure 4.1
provides a graphic demonstration of the meanings of the various coefficients in the model.
Figure 4.1: Coefficients in the random intercept model
134
One more extension to HLMs needs to be introduced. The previous model included one
level-1 equation and one level-2 equation. The level-2 equation allowed the intercept B0j to vary
across schools. In multilevel modeling terms, we call this equation a random intercept equation.
Careful readers might wonder why we are assuming that a single value of the IQ slope in the
level one model is presumed to apply to all schools. After all, we have already seen that it is
inappropriate to assume that a single intercept coefficient applies to all schools within our
dataset. Why would we assume that a single slope coefficient is applicable to all schools? The
relationship between IQ and test scores might vary across schools. We can formalize this
thinking as follows:
(5) Level 1: ijijjiij eIQBBTEST ++= )(10
Level 2 (intercept): jjj IQSCHOOLB 001000 )_( µγγ ++=
Level 2 (slope): jjB 1101 µγ +=
Note that we have added a subscript to the level-1 slope coefficient for IQ, denoting that we are
now estimating a separate slope coefficient for each school. We have also added an additional
equation describing how the level-1 slope coefficient varies across schools. This equation is
commonly called a random slope model. � 10 is the grand mean of the slope coefficients. Because
of the inclusion of the residual �1j, the value of B1j is allowed to vary randomly. Without it, our
model would require that all schools have a slope equal to � 10. The level-2 slope model
presented above is unconditional because we have not entered any explanatory variables.
Practically speaking, we would use this model in order to examine the significance of �1j. If the
value of �1j is not statistically significant, then we could conclude that a single slope coefficient
does indeed apply to all the schools. If it is significant, we would proceed with our analysis by
adding explanatory variables to the level-2 slope model to try to explain why different schools
135
have different slopes. We could enter SCHOOL_IQ as a predictor in our random slope model
just as we entered it into our random intercept model. Entering an explanatory variable into the
random slope model would change it from an unconditional model to a conditional model. The
model could be formalized as:
(6) Level 1: ijijjiij eIQBBTEST ++= )(10
Level 2 (intercept): jjj IQSCHOOLB 001000 )_( µγγ ++=
Level 2 (slope): jjj IQSCHOOLB 111101 )_( µγγ ++=
The coefficient � 11 describes the influence of the school’s average IQ score on the relationship
between individual IQ and test scores within that school. In other words, � 11 can be interpreted
as a cross-level interaction. Imagine that the value of � 11 was estimated at .25, and that the value
is statistically significant. Also recall that the SCHOOL_IQ variable was grand-mean centered
to facilitate the interpretation of � 00 in the intercept model. If our overall school grand mean IQ
is 100, then � 10 is the expected value of B1j for a school with an average IQ of 100. If the second
school has a higher average IQ, then the strength of the relationship between individual IQ and
test scores within than school will be greater than the relationship between IQ and test scores in
the first school. Figure 4.2 illustrates the meaning of the coefficients in the random slope model.
The dotted lines represent � 10, the overall mean slope for schools with an average IQ of 100.
Note that the line for B12, the slope for the second school, is significantly steeper than the slope
B11 for the first school. The difference between the schools’ slopes B1j and the grand mean slope
� 10 is represented by � 11.
136
Figure 4.2: Coefficients in the random slope model
Multilevel Structural Equation Modeling
Multilevel structural equation modeling (ML-SEM) is quite easy to understand once
HLM is understood. Though there are some significant differences in the underlying equations
and estimation methods, the two techniques are conceptually quite similar. Researchers might
choose ML-SEM over HLM for the same reasons that researchers choose SEM over standard
regression models. These reasons might include:
1) SEM is frequently used to test a theoretical model or a set of theoretical models against a
dataset. Numerous fit indices are calculated by SEM programs that indicate how well the
proposed model fits the data.
2) SEM allows researchers to propose a structure among explanatory variables, such that the
explanatory variables are able to influence one another as well at the outcome variable.
This allows the effect of one variable on another to be decomposed into a direct effect
and an indirect effect. This decomposition is often of theoretical interest.
137
3) SEM allows researchers to incorporate latent variables directly into the model. Latent
variables that are “captured” by multiple indicator variables are not subject to
measurement error as observed variables are. A researcher studying hope, for instance,
could administer several different scales to measure aspects of hope and then combine
these scales into a single latent variable. Each individual scale would include a certain
amount of measurement error, but when multiple measures are incorporated into a single
latent variable, the latent variable is free from measurement error because the unique
variance in each scale, which includes the measurement error, is not included into the
latent variable (McCoach, 2003).
Structural equation models are frequently described through graphic representations. In
SEMs, observed variables are contained in boxes, latent variables are contained in circles, causal
paths (i.e., X causes Y) are described by straight, single-headed arrows, and non-causal
correlations are described by curved, double-headed arrows. A complicating factor in SEMs is
that, in order to be able to successfully estimate a model, the number of parameters to estimate
must not exceed the number of unique pieces of information present in the variance-covariance
matrix of the dataset. In order to test the fit of a SEM, the number of unique pieces of
information must exceed the number of parameters to be estimated. Models that require the
estimation of exactly the same number of parameters as there are unique elements in the
variance-covariance matrix are called saturated models. All of the measures of model fit for
saturated models will indicate perfect fit. This does not mean that saturated models are
theoretically correct, it simply means that they are unable to be tested. Regression models are a
special case of saturated structural equation models. Figure 4.3 illustrates a sample SEM. In this
example, SES is a latent variable with three indicators: annual income, father’s education, and
138
mother’s education. Socioeconomic status has a direct causal influence on ability and
motivation. Ability and motivation, in turn, affect the student’s test score. Socioeconomic status
does not directly affect test score, instead, it exerts an indirect effect through ability and
motivation. Figure 4.4 shows how a regression model is specified in a SEM context.
Ability
Motivation
TestSES
AnnualIncome
Mother’s Education
Father’s Education
Figure 4.3: Example structural equation model
Ability
Motivation
Test
AnnualIncome
Mother’s Education
Father’s Education
Figure 4.4: Regression model specified in a SEM context
139
Multilevel structural equation models are extensions of ordinary SEMs in the same way
that HLMs are extensions of the ordinary regression model. Hierarchical linear models begin
with a level-1 equation and then add level-2 equations that describe how the coefficients in the
level-1 equation vary across clusters. Multilevel SEMs begin with a level-1 model with
parameters that might vary across clusters. Level-2 models are added that attempt to explain the
variation in the level-1 model parameters.4 Generally these level-2 models will be presented in
separate graphic figures. In ML-SEM terminology, level-1 models are usually labeled within-
cluster models while level-2 models are usually labeled between-cluster models. Parameters in
within-cluster ML-SEMs that will vary across clusters (and therefore serve as outcomes in level-
2 models) are usually indicated by a small circle, a diamond, or another symbol. Similar to
HLMs, level-2 SEMs that indicate how a variable mean varies across clusters are referred to as
intercept models while level-2 SEMs that indicate how a path coefficient varies across clusters
are referred to a slope models.
Data Analysis Examples
Introducing the Dataset
To facilitate the discussion of multilevel modeling, a sample dataset will be analyzed
using standard regression, SEM, HLM, and ML-SEM approaches. The dataset was originally
based on NELS data but was substantially modified by the author. It includes four student-level
variables and five school-level variables. Variable descriptions may be found in Table 4.1.
4 Ordinary SEMs account only for the variances and covariances between variables. Variable means are generally not considered. However, SEMs may be easily extended to examine variable means. For the remainder of the discussion in this paper, it is presumed that SEMs with a meanstructure are being estimated. Without a meanstructure, there would be no intercept term that could vary across clusters.
140
Table 4.1: Variable descriptions for sample dataset
______________________________________________________________________________
Variable Name Description
______________________________________________________________________________
Individual-level variables
StudentID Identification code for each student SchoolID Identification code for each school SES Standardized measure of student’s SES background Motivation Student motivation scale score Ability Student IQ score Test Student test score
School-level variables
Sch_HWTime Mean number of hours each week spent on homework Sch_Unsafe Proportion of students within school that feel unsafe during school day Sch_SES School mean SES* Sch_Mot School mean motivation* Sch_ability School mean IQ* ______________________________________________________________________________
* Variable created by calculating each school’s mean of an individual-level variable
Analysis Via Standard Regression
The individual-level variables from the dataset were entered into a standard regression
model with TEST as the outcome and SES, MOTIVATION, and ABILITY as explanatory
variables. The MPlus code for this analysis may be found in Appendix A. Each line of MPlus
code must end with a semicolon. The code listed under the DATA command tells MPlus the
location of the data file to be analyzed. The data file is in tab-delimited format without variable
labels. The code under the VARIABLE heading names the variables in the data file, indicates
which variables should be used, and performs any centering or transformations that might be
necessary. Note that we have grand-mean centered the explanatory variables in this analysis.
141
This allows the intercept to be interpreted as the expected test score for a student with an average
SES, ability, and motivation score. Under the MODEL heading, the code describing the
statistical model to be analyzed is described. The next line, TEST on ABILITY SES
MOTIVATION defines the regression model. The keyword “on” is used in MPlus to mean
“regressed on.” The line TYPE is MEANSTRUCTURE under the ANALYSIS heading causes
MPlus to calculate an intercept for the TEST variable. Finally, the STANDARDIZED option
under the OUTPUT heading is used to request standardized regression coefficients (betas) in
addition to the unstandardized coefficients. The model results may be found in the first column
of Table 4.2. All three variables are significantly related to the outcome. The three variables
explain about 53% of the variance in test scores.
Analysis Via Regression Accounting for Clustering
The next analysis examines the same regression model but correctly accounts for the
clustering of students within schools. MPlus code for this analysis may be found in Appendix B.
Note that the group-mean centering instead of grand-mean centering was requested for the
variables. The CLUSTER keyword is used to designate that variable that labels the clusters.
The WITHIN command is used to label the variables that should exist in the level-1 model only.
In general, all level-1 variables except the outcome should be declared as WITHIN. Under the
MODEL heading, we now have labels for the within (level-1) and between (level-2) portions of
the model. The regression model declared under the %WITHIN% heading is identical to the
model analyzed in the first regression. Under the %BETWEEN% label, only the outcome
variable is listed. In MPlus language, a variable name mentioned alone is used to refer to that
variable’s variance. By including the variance of TEST in the between part of the model, we are
asking MPlus to allow the intercept of TEST to vary randomly across schools.
142
Table 4.2: Regression and HLM model results ______________________________________________________________________________ Model Coefficient Regression Regression HLM1 HLM2 w/ Clustering (int) (int+slope) ______________________________________________________________________________ Level-1 SES 7.884* 0.985* 0.985* 1.040* SESES .408 .335 .334 .304 Ability .394* 0.248* 0.248* 0.443* SEability .024 .016 .016 .072 Motivation .690* 0.443* 0.443* 0.248* SEmotivation .137 .077 .076 .014 R2 .528* .300* .300* .312* ______________________________________________________________________________ Level-2 (intercept) Sch_HWTime 0.637 0.635 SEHWTime .356 .362 Sch_Unsafe 8.486 8.481 SEunsafe 8.702 8.816 Sch_SES 0.945 0.947 SESES 2.244 2.255 Sch_Mot 4.208* 4.206* SEmot 1.131 1.128 Sch_Ability 0.949* 0.950* SEability .193 .194 R2 .834* .834* ______________________________________________________________________________ Level-2 (slope) Sch_HWTime 0.008 SEHWTime .010 Sch_Unsafe 0.211 SEunsafe .159 Sch_SES 0.050 SESES .045 Sch_Mot 0.014 SEmot .021 Sch_Ability -0.003 SEability .004 R2 .500 ______________________________________________________________________________ * Parameter is significant, p < .05
143
The model we have specified is identical to the unconditional random intercept model previously
introduced in equation 3. The results of this model may be found in the second column of Table
4.2. Finally, the TWOLEVEL option was added under the ANALYSIS heading to request a
multilevel analysis. Notice the large discrepancies between the estimated parameter values and
standard errors for the two models. Additionally, the percentage of variance explained in TEST
has fallen to 30%. No variance at level-2 has been explained because no explanatory variables
were introduced into the level-2 model. The intra-class correlation coefficient (ICC) is .894,
indicating that nearly all of the variance in test scores is between clusters. This is the cause of
the large differences between the values estimated for the two models.
Analysis Via Random Intercept Hierarchical Linear Model
In the next analysis, we attempt to explain the variance in TEST across schools by
introducing variables into the level-2 intercept model. The MPlus code for this analysis may be
found in Appendix C. The school-level variables have been added to the USEVARIABLES line.
A line requesting that these variables be grand-mean centered was added, as well as a line
declaring that these variables operate on the between level. Under the model command, the
TEST variable was regressed on the school-level variables. The results of this analysis may be
found in the third column of Table 4.2. Note that the parameters for level-1 are nearly
unchanged as compared with the previous analysis. However, we now have estimates for the
effects of the school-level variables on the intercept. Only two variables, SCH_MOT and
SCH_ABILITY, were significantly related to the intercept. These two variables, however,
explained 83% of the between-schools variance in test scores.
144
Analysis Via Random Intercept and Random Slope Hierarchical Linear Model
Finally, we examine a model to determine whether the strength of the relationship
between individual ability and test score varies across schools. Substantively speaking, this
analysis could indicate whether or not some schools are better than others at allowing students
with high ability to master more course material, and if so, to determine what characteristics of
the schools are related to this. Only a few additional lines of code are needed to run this
analysis. MPlus code for this analysis may be found in Appendix D. The first thing we must do
is to define the slope parameter in the level-1 model that we will allow to vary across schools. In
our case, this is the slope relating individual IQ to test performance. We do so by entering the
following line:
SLOPE | TEST on ABILITY;
The | symbol is used to label random slopes in MPlus. The slope that is allowed to randomly
vary is specified after the | symbol while a name for this slope is provided before the | symbol.
The name will be used in the level-2 model to identify this slope coefficient. We enter
explanatory variables for this slope in the level-2 model by regressing SLOPE on school-level
variables. Also, it is common to observe some level of correlation between random slopes and
random intercepts. To allow this correlation to be estimated, the command SLOPE WITH TEST
was entered into the level-2 model. In order to estimate the random slope model, the RANDOM
command had to appended to the analysis type. MPlus cannot calculate standardized coefficients
in random slope models, so the request for standardized coefficients was removed from the code.
Before running the conditional random slope model, an unconditional random slope
model was estimated. This was done in the same way that the unconditional intercept model was
requested, by listing the SLOPE label in the level-2 model specification without entering any
145
explanatory variables for it. By listing SLOPE by itself, we are requesting that MPlus estimate
the variance in SLOPE across clusters. The results indicated that the variance in SLOPE was
0.004, which is very small but statistically significant. In reality, most analysts would probably
not continue by estimating a conditional random slope model at this point due to the extremely
small amount of variance in SLOPE. I have chosen to continue with the analysis for illustrative
purposes.
The results of this model are presented in the fourth column of Table 4.2. Note that
allowing the ABILITY slope to randomly vary across clusters caused the level-1 coefficients to
change. If the unconditional random slope model had provided convincing evidence that random
slopes were needed, we would treat these estimates as being more trustworthy than the estimates
produced in the random intercept model. This is not the case for this particular analysis. The
estimated coefficients for the intercept model are nearly unchanged. None of the variables
entered into the random slope model were significant. Together, they explained 50% of the
(tiny) variance in the slope coefficient, but this amount was not large enough to emerge as being
statistically significant.
This concludes the comparison of regression and HLM approaches to multilevel analysis.
In real life, researchers would want to examine the other two level-1 slopes (SES and
MOTIVATION) for evidence of random variance across schools. This could be done by
specifying multiple random slopes in the MPlus code as follows:
%WITHIN% S_Ability | Test on Ability; S_Motivation | Test on Motivation; S_SES | Test on SES;
%BETWEEN% S_Ability; S_Motivation; S_SES;
146
Analysis Via Structural Equation Modeling
If we wanted to test a particular theory about how the variables in the model are related
or if we were interested in estimating indirect as well as direct effects, structural equation
modeling would be a better way to approach the analysis of our dataset. In the following
discussion, the same dataset will be analyzed via ordinary SEM, SEM accounting for clustering,
ML-SEM via an intercept model, and ML-SEM via slope and intercept models. The SEM
results will include path values and standard errors that are conceptually identical to the B values
in ordinary regression. The SEM results will also include several measures of model fit. The
model fit information that will be presented here includes the exact-fit test (chi square),
standardized root mean residual (SRMR), root mean square error of approximation (RMSEA),
and the confirmatory fit index (CFI).
The theoretical within-schools (level-1) model hypothesizes that student SES causes both
ABILITY and MOTIVATION, and that ABILITY and MOTIVATION cause the TEST score.
Note that SES is hypothesized to exert an indirect effect on TEST through ABILITY and
MOTIVATION, but not a direct effect. The path model with the estimated coefficients and
standard errors is described graphically in Figure 4.5. The coefficients, standard errors, and
model fit information are presented in the first column of Table 4.3.
Though a full discussion of the derivation and interpretation of the various model-fit
statistics in SEMs is beyond the scope of this article, suffice it to say that the model fit statistics
for this model indicate that the fit is quite poor. All of the path coefficients in the model are
statistically significant.
147
Ability
Motivation
Test
0.532 (.023)
1.653(.127)
SES
8.119 (.379)
1.552(.066)
R2 = .230
R2 = .266
R2 = .519
Figure 4.5: Results for single-level SEM
To calculate the indirect effect of SES on TEST, the intervening coefficients are
multiplied together. In this model, there are two possible indirect paths from SES to TEST. One
of these is through ABILITY. The indirect effect through ABILITY is equal to 8.119 * 0.532 =
4.319. The indirect effect through MOTIVATION is 1.552 * 1.653 = 2.565. The total indirect
effect of SES on TEST is 4.319 + 2.565 = 6.882.
The MPlus code to run this analysis is available in Appendix E. The only differences in
the code between the single-level regression model and the single-level path model are the
commands that specify the model. In the regression model, a single command regressed all the
explanatory variables directly on TEST. In the SEM, three such statements are needed.
148
Table 4.3: SEM and ML-SEM model results
______________________________________________________________________________ Model From To SEM SEM ML-SEM1 ML-SEM2 w/Clustering (int) (int+slope) ______________________________________________________________________________ Within SES Ability 8.119 (.379)* 8.119 (.368)* 8.119 (.368)* 8.119 (.368)* SES Mot. 1.552 (.066)* 1.552 (.065)* 1.552 (.065)* 1.552 (.065)* Mot. Test 1.653 (.127)* 0.534 (.053)* 0.512 (.053)* 0.514 (.057)* Ability Test 0.532 (.023)* 0.261 (.011)* 0.255 (.011)* 0.253 (.014)* R2 Ability 0.230* 0.230* 0.230* -NA- R2 Motivation 0.266* 0.266* 0.266* -NA- R2 Test 0.519* 0.353* 0.342* -NA- � 2 (df) 514.353 (2) 511.647 (2) 279.567 (8) -NA- SRMRwithin 0.107 0.097 0.045 -NA- SRMRbetween 0.000 0.032 -NA- RMSEA 0.408 0.407 0.149 -NA- CFI 0.768 0.688 0.918 -NA- ______________________________________________________________________________ Level-2 S_SES S_Abl 10.044 (.619)* 10.044 (.619)* (int) S_SES S_Mot 1.028 (.156)* 1.028 (.153)* S_Abl S_Mot 0.068 (.013)* 0.068 (.013)* S_Unsafe S_Mot 0.524 (.499) 0.524 (.506) S_Mot S_HW 1.495 (.205)* 1.495 (.205)* S_Abl Test 0.676 (.164)* 0.685 (.182)* S_HW Test 0.659 (.299)* 0.595 (.332) S_Mot Test 3.987 (1.269)* 3.996 (.959)* S_SES with S_Unsafe -0.029 (.006)* -0.029 (.006)* R2 Sch_Ability 0.733* -NA- R2 Sch_Motivation 0.851* -NA- R2 Sch_HWTime 0.506* -NA- R2 Sch_Test 0.764* -NA- ______________________________________________________________________________ Level-2 S_Abl Slope -0.003 (.004) (slope) S_HW Slope 0.006 (.009) S_Mot Slope 0.032 (.022) R2 Slope 0.500 ______________________________________________________________________________ * Parameter is significant, p < .05
149
Analysis via Structural Equation Modeling accounting for Clustering
In the next analysis, we analyze a SEM that accounts for the clustering in the TEST
variable. This is done through an unconditional random intercept model. The results of this
model are presented in the second column of Table 4.3. The path coefficients and standard
errors from SES to ABILITY and MOTIVATION are nearly unchanged. However, the values
and standard errors for the paths leading to TEST are quite different. Also, the percentage of
variance explained in TEST has fallen from 51.9% to 34.2%. The model-fit information is also
nearly unchanged compared to the single-level path model. The SRMR, RMSEA, and chi-
square statistics indicate a slight improvement in model fit, though the CFI indicates a slightly
worse model fit.
The MPlus syntax to analyze this model may be found in Appendix F. We have now
requested a TWOLEVEL analysis type, have divided the model specification into WITHIN and
BETWEEN components, indicated the cluster ID variable, and labeled the variables that exist
only on the WITHIN level. We have requested that SES, ABILITY, and MOTIVATION be
grand-mean centered. Group-mean centering would be more appropriate to produce completely
unbiased level-1 coefficients and standard errors. Unfortunately, this model would not run when
group-mean centering was requested.
Note that the ABILITY and MOTIVATION variables are serving simultaneously as
explanatory variables and outcome variables, and as such, their intercepts may also vary across
schools. The model could easily be extended to allow the intercepts of these variables to vary
across clusters as well.
150
Analysis via Multilevel Structural Equation Modeling with a Random Intercept Model
In the next model, we will extend the unconditional intercept model from the previous
step by producing another SEM to attempt to explain the relationships between the school-level
variables. The new model is graphically depicted in Figure 4.6.
The intercept model hypothesizes that SCH_SES and SCH_UNSAFE cause SCH_MOT.
SCH_SES and SCH_UNSAFE are related, but one does not cause the other. SCH_SES causes
SCH_ABILITY. SCH_MOT causes SCH_HWTIME. SCH_ABILITY, SCH_MOT, and
SCH_HWTIME cause TEST. The model parameters are described in the third column of Table
4.3. Adding the intercept model greatly improved the model fit. Interestingly, the fit of the
within-schools model was improved according to the SRMR as compared with the previous
model. This dataset has a large ICC of .725, meaning that most of the variance in TEST happens
between schools. Until now, our models had assumed that all of this variance could be explained
by individual-level variables. This is impossible, and the model fit statistics reflected this.
Indeed, the model fit statistics now indicate reasonably good model fit. All of the path
coefficients in the model are significant with the exception of the path from SCH_UNSAFE to
SCH_MOT, indicating that the school motivation is not affected by the proportion of students in
the school that feel unsafe.
The MPlus syntax for estimating this model may be found in Appendix G. The only
addition to the code for estimating this model is the addition of the commands under the
MODEL heading for specifying the intercept model. The variable list in the USEVARIABLES
and BETWEEN statements were revised to include the school-level variables.
151
Within
Ability
Motivation
Test
0.532 (.023)
1.653(.127)
SES
8.119 (.379)
1.552(.066)
R2 = .230
R2 = .266
R2 = .342
______________________________________________________________________________
Between (intercept)
Sch_Ability
Sch_Mot Test
Sch_SES
Sch_UnsafeSch_HWTime
10.044(.619)
1.028 (.156)0.068(.013)
0.524 (.499) 1.495 (.205)
-.029(.006)
0.659 (.299)
0.676 (.164)
3.987(1.269)
R2 = .506
R2 = ..733
R2 = .764R2 =.851
Figure 4.6: Results for ML-SEM random intercept model
152
Analysis via Multilevel Structural Equation Modeling with Random Slope and Intercept Models
For the final analysis, the model was extended to model a random slope in the level-1
model. The path coefficient from ABILITY to TEST was allowed to vary across clusters. This
model is described graphically in Figure 4.7.
The random slope is denoted by a diamond in the within-schools model. As with the
corresponding HLM with a random slope, an unconditional model was estimated first to
determine whether or not the slope coefficient actually varies across schools. As before, the
variance across schools was .004, which was statistically significant but too small to be of
substantive interest. As before, a researcher examining this dataset would probably not proceed
with a conditional random slope model.
The model coefficients and standard errors may be found in the fourth column of Table
4.3. The coefficients for the within-schools and the between-schools intercept model are largely
unchanged as compared with the previous analysis. In the intercept model, the path from
SCH_HWTIME to TEST has become nonsignificant due to the slight increase in the standard
error of that parameter. None of the variables included in the slope model were significant, nor
was the percentage of variance explained. Note that most types of model fit information are not
available when examining random slope models. The syntax for conducting this analysis may be
found in Appendix H. The TECH8 option included under the OUTPUT heading causes the
model optimization history to be printed to the screen during the model estimation. Random
slope models are frequently very computationally intensive and may take some time to estimate
even on powerful computers. Some models that the author has examined have taken more than
eight hours to estimate. The availability of real-time optimization history can assure researchers
that the computer program is running and not frozen.
153
Within
Ability
M otivation
Test
0.516(.062)
SES
8.119 (.368)
1.552(.065)
.253(.014)
_____________________________________________________________________________
Between (intercept)
Sch_Ability
Sch_M ot Test
Sch_SES
Sch_UnsafeSch_HW Tim e
10.044(.619)
1.028 (.153)0.068(.013)
0 .524 (.506) 1.495 (.205)
-.029(.006)
0.595 (.332)
0 .685 (.182)
3.996(.959)
______________________________________________________________________________
Between (slope)
S ch_A bility
S ch_M ot
Sch_HW Tim e
-0.003 (.004)
0.032(.022)
0.006(.009)
Test
Slope
.137 (.087)
Figure 4.7: Results for ML-SEM random slope and intercept models
154
Discussion
The purpose of this article was to introduce multilevel modeling to the gifted education
community. It is hoped that this article will serve as an approachable introduction for
researchers who would like to begin working with multilevel analysis. At the very least, the use
of rudimentary multilevel analysis techniques can ensure that results generated from clustered
data are correct. As multilevel analysis becomes more mainstream in our field and others, it is
hoped that more sophisticated theories will emerge that are explicitly multilevel in nature. The
result can only be more sophisticated research and theory in our field that can provide more
complete, nuanced, and realistic explanations of educational phenomena.
155
References
Bachman, J. G., & O'Malley, P. M. (1986). Self-concepts, self-esteem, and educational
experiences: The frog pond revisited (again). Journal of Personality & Social
Psychology, 50(1), 35-46.
Curran, P. J. (2003). Have multilevel models been structural equation models all along?
Multivariate Behavioral Research, 38(4), 529-569.
Heck, R. H. (2001). Multilevel modeling with SEM. In G. A. Marcoulides & R. E. Schumacker
(Eds.), New developments and techniques in structural equation modeling (pp. 89-127).
Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Hox, J. (2002). Multilevel analysis techniques and applications. Mahwah, NJ: Lawrence
Erlbaum Associates..
James, L. R. (1982). Aggregation bias in estimates of perceptual agreement. Journal of Applied
Psychology, 67(2), 219-229.
Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of
Educational Psychology, 79(3), 280-295.
Marsh, H. W., & Parker, J. W. (1984). Determinants of student self-concept: Is it better to be a
relatively large fish in a small pond even if you don't learn to swim as well? Journal of
Personality & Social Psychology, 47(1), 213-231.
McCoach, D. B. (2003). SEM isn't just the schoolwide enrichment model anymore: Structural
equation modeling (SEM) in gifted education. Journal for the Education of the Gifted,
27(1), 36-61.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction
(3rd ed.). New York: Harcourt Brace.
156
Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis
methods. (2nd ed.). Thousand Oaks, CA: Sage.
Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American
Sociological Review, 15, 351-357.
Schreiber, J. B., & Griffin, B. W. (2004). Review of multilevel modeling and multilevel studies
in The Journal of Educational Research (1992-2002). Journal of Educational Research,
98(1), 24-33.
Scullen, S. E. (1997). When ratings from one source have been averaged, but ratings from
another source have not: Problems and solutions. Journal of Applied Psychology, 82(6),
880-888.
Stapleton, L. M., & Hancock, G. R. (2000). Using multilevel structural equation modeling with
faculty data.
Walker, N., & Catrambone, R. (1993). Aggregation bias and the use of regression in evaluating
models of human performance. Human Factors, 35(3), 397-411.
157
CHAPTER 5
SUMMARY AND FUTURE DIRECTIONS
158
This dissertation has focused on the analysis of a very large (N = 705,074) dataset
collected in 2004 by the Georgia Department of Education containing information on every
elementary school student enrolled in Georgia public schools. The purpose of this study was to
uncover new information about the underrepresentation of Black, Hispanic, and low-SES
students from all racial backgrounds in Georgia gifted and talented education programs. In spite
of years of research into this issue on the part of the gifted education community, a number of
questions remain regarding underrepresentation. It is my opinion that this lack of information
has severely impeded efforts to address the underrepresentation issue.
This chapter will take a somewhat different approach from the previous chapters.
Chapter one was written as a stand-alone literature review that attempted to describe the current
state of the field with respect to research on underrepresentation. Chapters two, three, and four
were written as stand-alone journal articles. Chapter two focused on the nomination stage of the
gifted identification process. It used a descriptive approach to examine the nomination rates for
various groups of students with respect to race and socioeconomic status. It also examined the
effectiveness of different nomination sources. Chapter three used multilevel structural equation
modeling to examine individual- and school-level variables that influence a student’s probability
of being identified for participation in the gifted education program. Chapter four was an
introduction to multilevel modeling methodologies written for an audience familiar with the
standard regression model. Whereas the previous chapters were written to stand alone, and often
with a particular journal’s audience in mind, this chapter will attempt to summarize the findings
from all the previous chapters and suggest directions for future research.
Summarizing the Nominations Study
The nominations study compared the performance of automatic (test-score based),
teacher, parent, peer, self, and other referrals and addressed two major research questions:
159
1. How do the various nomination sources compare with respect to overall quality as
indexed by overall accuracy, overall usefulness, and the phi correlation coefficient?
2. How do the nomination sources compare with respect to differential occurrence
across race and SES groups?
First, the results confirmed that underrepresentation remains a serious issue in Georgia.
The percentage of students identified exhibited extreme variation across racial groups, with
18.3% of Asian students being identified as gifted and only 2.3% of Hispanic students being
identified. White students, which were used as the baseline group in the study, were identified at
a rate of 7.9%, while 3.2% of Black students were identified.
With respect to the first research question, the study found that automatic referrals had
the best performance for all groups except for the high-SES Native American group, where
teacher referrals slightly outperformed automatic referrals. Overall, automatic referrals were
responsible for 57.1% of students identified for gifted program placement. Automatic referrals
were also the most accurate, with 86.3% students receiving an automatic referral being
subsequently identified. Teacher referrals were the second highest quality referral source.
Teacher referrals were the referral source for 37.7 percent of gifted students overall. Teacher
referrals were also quite accurate, with 74.9% of referred students successfully passing the
testing stage to become identified as gifted. The remaining referral sources, which included
referrals from parents, peers, other adults, and the students themselves, were both rarely used and
comparatively inaccurate. Slightly less than 95% of gifted students were referred either
automatically or by the classroom teacher. However, it should be noted that an issue with the
dataset prohibits strong comparisons of quality between the referral sources. This is because in
Georgia schools, only a single nomination source may be listed for each student, and automatic
160
referrals always occur before other nomination sources are considered. Therefore, the nature of
the data collection process unfairly advantages automatic referrals relative to the other referral
sources.
With respect to the second research question, the study found that nomination rates
exhibited extreme variation across racial and SES groups. These disparities in nomination rates
mirrored the overall pattern of disparities in gifted program enrollments, as may be seen in Table
1. In fact, the study uncovered evidence that the referral stage carries much more responsibility
for the disparities in program enrollments than the subsequent testing stage. Although
differences in the testing stage pass rates are evident across racial and SES groups in Table 2.4,
these differences are much smaller than the differences in nomination rates across groups.
Equalizing the pass rates for the testing stage would have very little effect on the
underrepresentation of Black, Hispanic, and low-SES students, but equalizing the nomination
rates (while leaving the pass rates unchanged) would nearly eliminate the underrepresentation
problem.
Summarizing the Identification Study
The identification study involved the creation of a multilevel structural equation model to
explain the probability of gifted identification in terms of a number of individual- and school-
level variables. Due to the complexity of the models proposed, the full results of the study will
not be replicated here, as this would require pages of figures and tables. However, some of the
more interesting aspects of the findings will be described.
One interesting aspect of the study is that it allowed the impact of race and SES on the
probability of gifted identification to be examined separately and together. In other words, the
model allowed the direct effect of race on the outcome, controlling for student SES to be
161
examined. It also allowed the direct effect of SES on the outcome to be examined, independent
of race. In the proposed model, race could affect the probability of gifted identification in two
ways: directly, and indirectly through the SES variable. No previous studies of
underrepresentation published in the gifted education literature to date have used this approach.
Because the outcome in question is dichotomous, model parameters are expressed in logits. The
conversion of logits to odds ratios is quite straightforward and must be done in order to properly
interpret the model parameters. The odds ratios are expressed relative to White students, with an
odds ratio of 1.0 indicating that a group has the same odds of being identified as a White
“reference student.” The direct effect of being Black on the probability of gifted identification
corresponded with an odds ratio of 0.39. Therefore, being Black lowers the odds of
identification to just 39% of the odds of identification for a White student. Being Hispanic
lowers the odds of identification to 0.34. Being Asian increases the odds of identification to
3.19, indicating that Asian students have over three times the odds of being identified gifted in a
given year than a corresponding White student.
The only socioeconomic status information available in this study was whether the
student received free lunch, reduced-price lunch, or did not receive lunch aid. This is obviously
only a rough indicator of socioeconomic status, which is believed to be far more complex than a
simple index of annual income (Yang & Gustafsson, 2004). Nevertheless, this SES variable had
a powerful effect on the probability of gifted identification. Relative to a student not receiving
aid, the odds of gifted identification for students receiving reduced-price lunch were only 0.52.
The odds of identification for students receiving free lunch fell to 0.28. The SES variable was
also highly related to race. Black students had 5.54 times the odds of receiving lunch aid as
White students, while Hispanic students had 7.88 times the odds of receiving lunch aid.
162
The baseline probability of being identified as gifted was not the same across schools.
This was modeled via a separate structural equation model for the intercept. The study identified
some school characteristics which explained some of this variability. Only three variables had
significant direct effects on the probability of identification at the school level. The incident-to-
student ratio, a variable describing the number of severe behavioral problems in the school
divided by the number of students in the school, had a significant negative effect on the school’s
average probability of identification. Surprisingly, the school lunch variable had a significant
positive direct effect. It was expected that the number of students in the school receiving lunch
aid would have a direct negative effect on the school’s mean probability of identification. The
percentage of the student body that is Black exerted a very weak positive direct effect as well.
Several variables exerted indirect effects through the school academic environment, a
composite variable representing the school ITBS achievement test scores. The academic
environment itself exerted a strong positive direct effect on the mean probability of gifted
identification. The percentage of students within the school that had been previously identified
as gifted did not exert a direct effect but did exert a positive indirect effect through the school
academic environment. The number of students receiving lunch aid exerted a negative effect on
the school academic environment and therefore exerted a negative indirect effect on the mean
probability of identification. The magnitude of this indirect effect was larger than the positive
direct effect. The percentages of students that were Black and Hispanic also exerted negative
effects on the school academic environment and thus had weak negative indirect effects on the
mean probability of identification. Finally, the percentage of students attending the school that
was Asian exerted a positive effect on the school academic environment and a positive indirect
effect on the school mean probability of identification.
163
Students of all types had the highest probability of being identified as gifted in schools
that had few serious behavioral incidents and high achievement test scores. Schools that had
high achievement test scores tended to have a large number of previously identified gifted
students and did not have a high incidence of student poverty. These schools also tended to have
smaller Black and Hispanic populations and larger Asian populations, though the effects of
student body race were quite modest. Interestingly, neither the racial composition of the school’s
teachers, the percentage of teachers holding advanced degrees, the average number of years of
teacher experience, the school size, the percentage of students that had been retained, nor the
percentage of students that had been migrant had any effect on either the school academic
environment (as operationalized) or the mean probability of gifted identification.
The study investigated whether the amount of “disadvantage” of being Black, Hispanic,
or receiving lunch aid varied across schools, as well as whether the amount of “advantage” of
being Asian varied across schools. The impact of being Black or Hispanic did not vary across
schools. The impact of receiving free or reduced-price lunch did vary across schools. These
research questions were addressed through structural equation models of the slope coefficients
from the individual-level model. Only two variables affected the impact of receiving free or
reduced-price lunch: the percentage of students in the school that had been previously identified
as gifted and the school academic environment. The more students had been previously
identified, the less receiving FRL negatively affected the probability of identification. Better
school academic environments led to more disadvantage for receiving FRL.
The degree of “advantage” for being Asian depended on three variables. First, the more
students in the school were receiving lunch aid, the more advantage Asian students experienced.
164
Second, higher percentages of Black teachers resulted in slightly less advantage for Asian
students. Finally, Asian students were less advantaged in schools with large student bodies.
The Emerging Picture of Underrepresentation
Though many unanswered questions remain about the underrepresentation phenomenon,
the studies described above have yielded some important facts that may be useful both for
informing practical attempts at addressing the problem and for future research.
The gifted identification process in most schools is a two-stage process. Logically
speaking, students must pass through both stages to gain access to gifted education services.
Nearly all of the current research with the goal of improving minority student representation in
gifted programs is based on the assumption that minority students may not be able to score
highly on psychometric ability and achievement measures. Gifted education scholars have
expended considerable effort in making modifications to traditional assessment schemes to
identify more minority students. Examples of such include assessment schemes based on
dynamic assessment (Kirschenbaum, 2004), non-verbal ability tests (Naglieri & Ford, 2003),
performance-based assessments (VanTassel-Baska, Johnson, & Avery, 2002), and assessments
based on Gardner’s (1983) theory of multiple intelligences (Sarouphim, 1999). The nomination
stage has inexplicably been overlooked in the literature, even though students do not get a chance
to be assessed if they do not receive a nomination. The results of the nomination study provided
strong evidence that inequalities at the nomination stage are the primary cause of
underrepresentation. The results of this study were replicated through the use of an agent-based
simulation study of the gifted identification process, currently in progress, which began with the
assumption that Black and White students have the same initial ability distributions (McBee, in
progress). The study found that only a slight reduction in nomination validity, in addition to as
165
few as 6 point of systematic downward bias at the nomination stage, could account for the
underrepresentation of Black students in gifted programs, assuming that the testing stage
performs equally well for both groups.
It appears that race and social class do make independent contributions to the
underrepresentation issue. Even after controlling for student socioeconomic status via the
student’s free or reduced-price lunch status, a three-level categorical variable, race had an
enormous impact on the probability of identification. Hispanic students experienced the largest
racial penalty followed closely by Black students. Asian students experienced a large racial
advantage. Race also had an enormous impact on the probability of receiving lunch aid, such
that most Black and Hispanic students face both racial and socioeconomic disadvantages with
respect to gifted program identification. The true independent effects of race and social class,
however, cannot be fully explored by this dataset. Socioeconomic status is generally
conceptualized as a composite of annual family income, parental education, and parental
occupation (Yang & Gustafsson, 2004). Socioeconomic status also has a historical component,
such that a family with highly educated adults who have a sizable income does not transition
from high to low SES if the income is suddenly reduced. The SES data available in this study
only considered one roughly categorized dimension of SES, family income. It is possible and
indeed likely that Black and White students or Hispanic and White students treated as
socioeconomically equal by this study (i.e., are both in the same category of the FRL variable)
are actually quite different in terms of family income as well as SES as a whole. This would
especially apply to the “paid” lunch category, which only means that the student’s reported
family income did not exceed 1.85 times the federal poverty line for a given family size (US
Department of Agriculture, 2003). Obviously, great differences in income (and subsequent
166
opportunity) may take place within this category, differences that may be very meaningful in
terms of conferring additional educational support and opportunity. It is possible that some of
the variance that truly is caused by SES differences was attributed to race differences in the
study. Therefore, though this study has provided evidence that race and social class have
independent effects on the likelihood of gifted identification, the limitations of the data used do
not permit us to reach a final verdict.
Schools are not identical with respect to the success with which they identify gifted
students. A given student’s probability of being identified varies quite strongly across schools.
Seventeen percent of Georgia schools identified no gifted students during the year, while one
school identified 29% of its students as gifted. The mean percentage of students identified was
3.0 with a standard deviation of 3.3. Schools that identify large proportions of their students
each year have some characteristics in common. First, they are safe. Schools with lower
incident-to-student ratios identified more students. Second, the students attending the school
score highly on standardized achievement tests. Third, the schools do not serve economically
disadvantaged students. These results should not be surprising. Schools wherein students have
the best chance of being identified are healthy, successful schools. This, of course, largely
depends on the school composition. The study found that 70% of the variance in school
standardized achievement test scores could be explained by the school composition. The
interpretation of these results is muddied because there was no individual-level measure of
ability or achievement available in the dataset. Therefore, ability, the potentially most important
factor in whether or not a student was identified, was uncontrolled. Some variables in the model,
particularly the SES variables, at both the individual and school levels, may be carrying some
167
variance that should be rightly attributed to student ability due to the high correlation between
SES and ability found in previous studies.
One way to interpret these results would be to argue that schools that successfully
identify large numbers of students do so precisely because they are filled with identifiable
students (i.e., students whose true ability, motivation, achievement, and creativity exceeds the
requirements for gifted participation), whereas schools that do not identify many students simply
do not have many students that are identifiable. This approach views the efficacy of the gifted
identification process as being essentially the same in a variety of school settings, and the
different probabilities of identification reflect the true abilities of the students attending the
schools.
Another approach argues that there are identifiable students within all schools (though
perhaps not in the same numbers), and that some schools, by virtue of their leadership, location,
funding, and priorities, have committed more time and effort to identifying students and have
developed more effective strategies for doing so. This increase in time and effort means that
students who might be overlooked in other schools would be more likely to be identified. This
study is unable to distinguish between these causes. If individual-level ability data had been
available and included in the individual-level model, ability would have been controlled in the
school-level models as well. Without such data, conclusively disentangling these competing
explanations is not possible. However, the finding that the impact of receiving lunch aid or of
being Asian varied across schools lends some support to the hypothesis that, regardless of
differences in true student ability characteristics, some schools are more effective than others at
identifying gifted students.
168
Directions for Future Research
The studies included in this dissertation have significantly extended our knowledge of
gifted identification process. The primary weakness of these studies is related to weakness in the
dataset itself. Thought the dataset was of unprecedented size and complexity, it was not
collected for the purposes of this study and therefore omitted some important variables that could
have addressed some of the unanswered questions described above. Future research could
resolve some of these questions by collecting a better dataset – a project that would surely
involve a large investment of time and financial resources.
It is likely that states other than Georgia collect large datasets of this type. This type of
work should be replicated in other states to determine the national characteristic of the
underrepresentation issue. Because Georgia operates under uniform rules and policies with
respect to gifted education, with comparatively little difference across districts, this study was
not able to address concrete policy issues. If similar data were collected from enough states, the
impact of various policy decisions could be assessed. The results could then be used to
empirically determine the most effective policies for identifying traditionally underrepresented
students.
Unfortunately, the main question that underlies work in this area remains largely
unanswered. That question is: to what extent are differential gifted program enrollments across
groups caused by differences in underlying ability, and to what extent are they caused by
inequalities in the assessment process? I am beginning to investigate this problem via an agent-
based simulation study that is currently in progress. A natural next step for this line of research
would be the pursual of grant funding. Funding would allow researchers to assess all students
rather than relying on a nomination stage to select out a small group of students. The results of
169
this study provide evidence that this step alone could greatly equalize gifted program
enrollments.
170
References
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic
books.
Kirschenbaum, R. J. (2004). Dynamic assessment and its use with underserved gifted and
talented populations. In A. Y. Baldwin & S. M. Reiss (Eds.), Culturally diverse and
underserved populations of gifted students (pp. 49-62). Thousand Oaks, CA: Corwin
Press, Inc.
McBee, M. (in progress). Insights from an agent-based simulation of the gifted identification
process.
Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children
using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47(2), 155-
160.
VanTassel-Baska, J., Johnson, D., & Avery, L. D. (2002). Using performance tasks in the
identification of economically disadvantaged and minority gifted learners: Findings from
Project STAR. Gifted Child Quarterly, 46(2), 110-123.
Sarouphim, K. M. (1999). DISCOVER: A promising alternative assessment for the identification
of gifted minorities. Gifted Child Quarterly, 43(4), 244-251.
United States Department of Agriculture Food and Nutrition Service. (2003, March 13). Child
nutrition programs: Income eligibility guidelines. In Federal Register, 68(9). Retrieved
from http://www.fns.usda.gov/cnd/Governance/notices/iegs/IEGs03-04.pdf
Yang, Y., & Gustafsson, J. E. (2004). Measuring socioeconomic status at individual and
collective levels. Educational Research & Evaluation, 10(3), 259-288.
171
Appendix A: MPlus code for regression analysis
Title: Example single-level regression Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test; Centering = grandmean(SES ability motivation); Model: Test on SES Ability Motivation; Analysis: Type is meanstructure; Output: Standardized;
172
Appendix B: MPlus code for regression accounting for clustering Title: Example regression accounting for clustering Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test; Centering = groupmean(SES Motivation Ability); Cluster is SchoolID; Within are ability SES Motivation; Model: %within% Test on Ability SES Motivation; %between% test; Analysis: Type is meanstructure twolevel; Output: Standardized;
173
Appendix C: MPlus code for random intercept hierarchical linear model Title: Example HLM intercept model Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Centering = groupmean(SES Motivation Ability); Centering = grandmean(Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability); Cluster is SchoolID; Within are ability SES Motivation; Between are Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Model: %within% Test on Ability SES Motivation; %between% Test on Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Analysis: Type is meanstructure twolevel; Output: Standardized;
174
Appendix D: MPlus code for random slope and intercept hierarchical linear model Title: Example HLM slope and intercept model Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Centering = groupmean(SES Motivation Ability); Centering = grandmean(Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability); Cluster is SchoolID; Within are ability SES Motivation; Between are Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Model: %within% Test on SES Motivation; Slope | Test on Ability; %between% Test on Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Slope on Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Test with slope; Analysis: Type is meanstructure twolevel random;
175
Appendix E: MPlus code for single-level structural equation model Title: Example SEM Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test; Centering = grandmean(SES ability motivation); Model: Motivation on SES; Ability on SES; test on motivation ability; Analysis: Type is meanstructure; Output: Standardized;
176
Appendix F: MPlus code for SEM accounting for clustering Title: Example SEM with clustering Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test; Cluster is SchoolID; Centering = grandmean(SES ability motivation); Within are SES Motivation Ability; Model: %within% Motivation on SES; Ability on SES; test on motivation ability; %between% test; Analysis: Type is meanstructure twolevel; Output: Standardized;
177
Appendix G: MPlus code for ML-SEM with random intercept model Title: Example ML-SEM intercept Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Cluster is SchoolID; Within are SES Motivation Ability; Between are Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Centering = grandmean(Ability Motivation SES Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability); Model: %within% Motivation on SES; Ability on SES; Test on motivation ability; %between% Sch_Ability on Sch_SES; Sch_Unsafe with Sch_SES; Sch_Mot on Sch_SES Sch_Ability Sch_Unsafe; Sch_HWTime on Sch_Mot; Test on Sch_Ability Sch_HWTime Sch_Mot; Analysis: Type is twolevel meanstructure; Output: Standardized;
178
Appendix H: MPlus code for ML-SEM with random slope and intercept models Title: Example ML-SEM intercept + slope Data: File is "c:\SEM data\sample dataset.dat"; Variable: Names are StudentID SchoolID SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Usevariables are SES Motivation Ability Test Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Cluster is SchoolID; Within are SES Motivation Ability; Between are Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability; Centering = grandmean(Ability Motivation SES Sch_HWTime Sch_Unsafe Sch_SES Sch_Mot Sch_Ability); Model: %within% Motivation on SES; Ability on SES; Test on motivation; slope | Test on ability; %between% Sch_Ability on Sch_SES; Sch_Unsafe with Sch_SES; Sch_Mot on Sch_SES Sch_Ability Sch_Unsafe; Sch_HWTime on Sch_Mot; Test Slope on Sch_Ability Sch_HWTime Sch_Mot; Analysis: Type is twolevel random meanstructure; Algorithm = integration; Output: Tech8;
Recommended