Format effects in two teacher rating scales of hyperactivity

Journal o f Abnormal Child Psychology, VoL 9, No. 2, 1981, pp 203-218

Format Effects in Two Teacher Rating Scales

of Hyperactivity 1

Jonathan Sandovai University o f California, Davis

The object o f this study was to investigate the effect o f differences in format on the precision o f teacher ratings and thus on the reliability and validity o f two teacher rating scales o f children's hyperactive behavior. Teachers (N = 242) rated a sample o f children in their classrooms using rating scales assessing similar attributes with different formats. For a subsample the rating scales were readministered after 2 weeks. The results indicated that improvement can be made in the precision o f teacher ratings that may be reflected in improved reliability and validity.

The most ubiquitous method used to identify and monitor the behavior of children considered to be hyperactive in school settings is the rating scale completed by teachers. Measures obtained from rating information are quick and relatively inexpensive to obtain, and teachers are often the professionals best able to observe children's reactions to tasks that require (a) sustained attention, (b) a systematic and deliberate approach to problem solving, (c) self-control of physical activity level, and (d) social interactional skills with peers and teachers. Teachers have a wide experience with children in similar situations and, the longer they teach, a standard of comparison that parents and many physicians do not have.

Teacher rating scales used in assessing hyperactivity are not without problems, however. In a previous paper I have criticized the existing rating scales on a number of grounds: (1) they contain leading questions

Manuscript received in final form September 18, 1980. 'This research was supported by grants from the Spencer Foundation and the National Institute of Mental Health (MH 29495). A somewhat different version of this paper was presented at the annual convention of the American Educational Research Association, Toronto, Canada, March 1978. The author thanks Nadine Lambert and Leonard Marascuilo for their assistance in the preparation of this paper.

0091-0627/81 /0600-0203 $03.00/0 �9 1981 Plenum Publishing Corporat ion

204 Sandoval

worded in the negative and are thus subject to rater bias and response sets such as reverse halo effects or reverse generosity errors (Selltiz, Wrightsman, & Cook, 1976), (2) the items consist of traits of symptoms and thus no concrete descriptive anchors so that there are no set standards for the behavior to be rated against, and (3) the scale's factorial structures indicate that teachers do not draw critical distinctions in children's behavior. The result of these forces, mainly originating in the rating format, is that the weaknesses already inherent in teacher rating procedures are magnified (Sandoval, 1977).

When grades or other teacher ratings are used as criterion measures for research purposes, the sought-after relationships are often hard to establish because teachers vary in their standards for a given rating, and differ in their propensity to avoid or to use particular rating categories. Many teachers report discomfort in rating only negative, socially undesirable behavior, such as the content of most hyperactivity rating scales. The extent of teacher bias is often difficult to assess because classrooms do differ in the level of performance of pupils and in the degree of heterogeneity with respect to the attribute being rated. As a result, teacher ratings may not have a great deal of precision. Both the mean and standard deviation of one teacher's ratings may be quite different from another's even though their classrooms may be made up of similar children. From the perspective of sources of error in a test score, the amount of error variance attributable to teachers could be considerable.

By changing the format of a teacher rating scale, the intent is to make the judgments of children's behavior by one teacher more comparable to those of another and to increase the distribution of scores across classrooms so that discriminations between pupils may be made more easily. Very serious decisions are made on the basis of teacher rating scales: the evaluation of a new medication in a research study by psycho- pharmacologists, the referral of a child to a physician by school personnel, the labeling of a child as hyperactive or the prescription of a drug by a physician, the estimation of the prevalence of hyperkinesis in a particular population. Finding procedures yielding the most precise measure is, or should be, an important quest for researchers in the area of childhood hyperactivity.

This study investigated the improvement in precision of teacher ratings that may come about from changes in format. If teachers become more precise in rating, concomitant changes may occur in the reliability and validity of the scale. The work reported represents an effort to develop an effective method for school personnel to contribute to the evaluation of children who may be considered hyperactive. The question under investigation was whether or not the addition of positively and negatively phrased items along with the use of more concrete behavior

Teacher Rating of Hyperactivity 205

descriptors in a new teacher rating scale influences the reliability and validity of teacher rating procedures.

M E T H O D

Subjects

In two northern California urban-suburban counties 242 teachers rated five or six children in their classrooms as part of a longitudinal study of children considered hyperactive. One, and sometimes two of the children in the classroom were study children and have been described by Lambert and Sandoval (1980). Approximately 22% of the children from the longitudinal study were identified as hyperactive by a physician and met the same standards as subjects in clinical medication studies--namely, had teacher and parent ratings of nonmedicated hyperactive behavior in the top 15% of children, had physician confirmations of hyperactivity and the absence of a competing medical explanation for the behavior, had parental reports of long-standing hyperactive behavior, and were in the normal range of intellectual functioning. Thirteen percent were also identified as hyperactive, but for these subjects, physicians had established a competing diagnosis. Another 13% were identified as hyperactive by parents or teachers but not by physicians, 15070 had 2 years earlier been considered extreme in their behavior and similar to hyperactive children by a teacher hut were not considered hyperactive by their parents or teacher, and 37% had been selected at random. In addition to the project children, the teachers rated an additional four children selected at random. Their ratings provided a behavioral frame of reference for the teacher's rating, and data by which to compare project children to classroom peers. All teachers were blind as to the status of the project child, although they knew the child participated in a research project that included both hyperactive and control children. The final sample consisted of 1,120 children, grades 2 to 8, 12~ of whom were considered hyperactive by physicians, parents, or teachers.

Measures

Four teacher rating measures were used, each of a different type. The first, the Behavior and Temperament Survey (BTS) (Sandoval, Lambert, & Sassone, 1980), had been designed for use in the project in the early 1970s before other rating scales had been in widespread use. It is

210 Sandoval

similar in form to the Conners scale (Conners, 1969) and the Greenberg scale (Greenberg, Deem, & McMahon, 1972) in that it is a list of general attributes, usually deviant behaviors, that early research had indicated were characteristic of hyperactive children. Thirty-two descriptors of hyperactivity from articles by Stewart, Pitts, Craig, and Dieruf (1966), Clements and Peters (1962), and Laufer and Denhoff (1957) are used as scale items. The attributes to be assessed were motor restlessness, inattentiveness, impulsivity, and aggressiveness/emotional lability. The respondent checks a 4-point scale from "not at all characteristic" to "very much characteristic." By coincidence, over half the items were later compared to and found to be similar or identical to those on the Conners scale. The BTS is comparable to those used in most research on hyperactive children.

The second measure, the School Behavior Survey (SBS), was designed to respond to many of the problems inherent in the BTS. Rather than general attributes, it consists of descriptors of children in school settings. In this respect it is similar to the Davids scales (Davids, 1971) and the Bell scale (Bell, Waldrop, & Weller, 1972), although it contains a greater number of items. Instead of the list containing only negative or undesirable behaviors, half the 32 items are positively stated. In addition, the items were constructed with an intent to tap four characteristics: motor restlessness, impulsivity, attention/distractibility, and social aggressiveness, with the same number of items for each attribute. On this rating form the 4-point scale descriptors consisted of the frequency the child acted in a particular way: every day (or nearly always), once or twice a week, rarely, and never. Table I presents the items of both hyperactivity rating scales.

The third measure is the Pupil Behavior Rating Scale (PBRS) (Lambert & Hartsough, 1973, 1979). This rating scale is made up with an entirely different format using anchored behavioral descriptors that have been empirically verified on each of 11 attributes:

1. Gets into fights and quarrels with other pupils more often than others.

2. Has difficulty in following directions in the classroom or in games on the playground.

3. Makes immature or inappropriate responses during normal school activities.

4. Is overly dependent on his teacher for choice of activity and becomes uneasy without continued supervision.

5. Has to be coaxed or forced to work or play with other pupils. 6. Is easily distracted. 7. Behaves in ways that are dangerous to self and others.

8. Has no enthusiasm for school and does not respond to or maintain interest in learning tasks.

9. Has difficulty in learning school subjects. 10. Becomes sick or upset or may stay home from school when faced

with a difficult problem or situation. 11. Seems unhappy or depressed.

The anchors for each attribute have been assigned values ranging from .00 to 3.00. The 11 attributes have been reduced to three factors: (a) adaptation, (b) interpersonal functioning, and (c) affective or intrapersonal functioning. A behaviorally anchored scale has the virtue of encouraging raters to be more precise in differentiating children. For further descriptions of this procedure, see Smith and Kendall (1963) and Campbell, Dunnette, Arvey, and Hellervik (1973).

Another concurrent measure of hyperactive behavior, the Abbreviated Symptom Questionnaire (ASQ; Sprague & Sleator, 1973) was employed as the fourth measure. The 10-item scale, derived from the Conners 36-item teacher rating scale, is similar in format to the BTS and is widely used in research on the use of medication with hyperactive children. It too has items tapping the four attributes of motor restlessness, inattentiveness, impulsivity, and aggressiveness/excitability. Only 1 item deals with impulsivity, however.

Procedure

All teachers were asked to rate the sample children using the BTS and the PBRS. A randomly selected subsample of 80 was also asked to rate the children with the SBS (the balance completed another experi- mental questionnaire dealing with curriculum practices). Two weeks later, all teachers were asked to volunteer to again rate the children with two of the following three measures: the SBS, the ASQ, or the BTS. The ASQ was administered only on the second occasion and to the majority, but the assignment was done randomly. Teachers declined to participate in the rerating, or they did not use all items on the ratings so that complete data were available on 902 pupils from the second ratings. This crude matrix sampling permitted estimates of test-retest reliability for the BTS and the SBS as well as estimates of the concurrent validity for all four measures.

R E S U L T S A N D D I S C U S S I O N

The major focus of this inquiry is on the two rating scales for hyperactivity, the BTS and the SBS. The other Scales, the PBRS and the

212 Sandoval

ASQ, were included for comparison purposes. If the differences between the BTS and the SBS had the result of making the teachers more sensitive to individual differences and more discriminating, a number of outcomes are likely. First, the total score distribution would be different. Next, the teachers would use more of the categories available to them during their rating on the SBS than on the BTS. In addition, the score obtained with the SBS would be more reliable and have greater validity in that it would cor- rectly classify children considered hyperactive by other means.

To examine the effect of the different rating scales on the distribution of scores, the total score range on each measure was divided into 10 equal intervals. The number 10 is arbitrary but represents the distribution well and permits the analysis below. Next, the total score for each of the 418 children rated on both measures concurrently was assigned to the appropriate interval. Finally a cross-tabulated matrix (Table II) was prepared indicating where each pupil was rated on both measures. An inspection of the marginal figures illustrates that the BTS scores are highly skewed, whereas the SBS scores are more evenly spread across the range.

Since the scores are correlated, the comparability of the distributions may be evaluated by examining the symmetry of the off-diagonal probabilities in the matrix with Bowker's test (Marascuilo & McSweeney, 1977). This procedure indicated that the probabilities are not symmetrical (x~ = 295, p < .001) and that the distributions are different for the two measures. Teachers using the SBS make more discriminations in their ratings and distribute children evenly across a behavioral dimension.

It is also possible to examine directly differences in the use of the four categories on the BTS and the SBS. Thirty-seven teachers each rated five children in their classrooms on both rating scales concurrently. Table III lists the mean number of times the teachers used each rating category on each of the surveys in the rating of all their children. A separate tabulation was made for each of the four categories over the 32 items on each survey and for the 16-item subscale of the BTS consisting of the 16 items matched to the negatively worded item subtest of the SBS. These items are footnoted in Table I.

By comparing the frequency with which each rating category was used on the entire BTS and SBS, it is possible to judge the influence of the differences in the two measures. When the frequency of use of categories I through 4 on the SBS is compared with the parallel categories on the BTS by four matched-pairs tests, all are used significantly differently (category 2, p < .05; the rest, p < .02). Teachers appear to be more willing to use the extreme rating categories on the SBS, and, as the distribution of the total scores implies, the greater the dispersion of the ratings, the more discriminating the measurement.

4 5 6 7 8 9 10

214 Sandoval

Table III. Mean Frequency Teachers Use Each of Four Rating Categories in Rating Pupils on the Behavior and Temperament Survey and the School Behavior Survey (N = 37)

Category

1 2 3 4

Not at all A little Quite a bit Very much Characteristic Characteristic Characteristic Characteristic

Behavior and X 18.63 7.66 3.39 Temperament SD (5.54) (3.28) (2.39) Survey (32 items)

Sixteen-item ,,~ 8.16 5.39 2.22 subscale of items SD (2.84) (2.16) (1.47) matched to negatively worded SBS items

2.32 (2.36)

1.46 (1.44)

Once or twice Every Never Rarely per week day

School Behavior X 12.11 8.90 6.81 4.18 Survey a SD (4.86) (3.10) (2.39) (2.62) (32 items)

Sixteen-item )( 3.39 5.39 3.98 3.26 subscale of SD (2.99) (2.16) (1.68) (1.83) negatively worded items

Every Once or twice day per week Rarely Never

Sixteen-item ,,Y 8.73 3.52 2.84 .91 subscale of SD (2.43) (1.81) (1.09) (1.02) positively worded worded items

aCategory values reflected for items worded positively so that high scores related to hyperactivity.

It is instructive to examine the differences in the usage of the rating categories 1 through 4 on the two negatively worded subscales consisting of items common to both the BTS and the SBS. Again, using matched- pairs tests, the usage was shown to be different on each measure (all categories p < .01). Teachers were more willing to use the rating categories indicating moderate and extreme deviance on the SBS if the negative items were interspersed with positive ones and had more concrete descriptors.

In a study of attitudes toward old people, Kogan (1967) found differential correlations with other measures of attitude subscales made up of positively worded items and versions made up of comparable negatively worded items. Because there were two such subscales on the

SBS, the positive and negative subscales could be examined for category usage. The correlated t-test comparisons of the category usage on these subscales also indicated that each category was used significantly differently (p < .01). The categories indicating deviance were selected more often for negatively worded items than for positively worded items. This finding comes as no surprise but suggests that the addition of positive items does cause more consideration to be given to all of the items rated.

Reliability

Changes in the precision of the ratings should lead to improved reliability, particularly retest reliability. Two estimates of reliability have been calculated for the BTS and the SBS, an internal consistency measure, and a stability coefficient. Table IV presents these results. Although the alpha coefficients of the two scales are virtually identical, the retest reliability of the SBS is higher, although the difference is not quite significant. The reliabilities are sufficiently high to use the results of these ratings to make decisions about pupils.

Concurrent Validity

Table V presents the concurrent validity coefficients for the three rating scales. Examining the correlations between ratings made at the same time, it is clear that all of the ratings correlate to a fairly high degree with each other. The BTS correlates significantly higher with the ASQ than does the SBS (the 8 = .05 confidence intervals for the correlations do not overlap). Since the ASQ and the BTS have the same format and

Table IV. Reliability Coefficients for Hyperactivity Teacher Rating Scale

Test-retest Standardized item stability alpha internal

Mean SD coefficient consistency

Behavior and 15.01 18.79 .89 .96 Temperament (N = 1,120) (iV= 1,120) (N = 152) (N = 1,120) Survey Total

School Behavior 67.15 19.43 .95 .95 Survey Total (N = 448) (iV = 448) (N = 33) ( N = 448)

16.99 6.79 _ .92 a (N = 716) (?4 = 716) (N = 716)

Conners

a.97 corrected to length 30 items.

216 S a n d o v a l

T a b l e V. Concurrent Validity Estimates for Hyperactivity Rating Scales

Behavior and School Behavior Temperament Survey Survey

- - .82 (N = 418)

.82 (N = 418)

.89 (N = 672)

.75 (N = 1,120)

Behavior and Temperament Survey

School Behavior Survey

Conners Abbreviated Symptoms Questionnare

Pupil Behavior Rating Survey Total

1. Fights and quarrels 2. Difficulty

following directions

3. Immature 4. Dependent

teacher 5. Coaxed or forced

to work 6. Easily distracted 7. Dangerous to self

and others 8. No enthusiasm

for school 9. Difficulty learning

10. Sick or upset 11. Unhappy or

depressed

.76 (N = 95)

.81 (N = 421)

.69 .71

.56 .66

.28 .17

.63 .76

.69 .71

.57 .69

.40 .57

.41 .50

.53 .51

very s imi lar conten t , the resul t m a y ref lect re l iab i l i ty o f the t eache r ' s " r a t i n g se t " r a the r t han re l iab i l i ty o f the behav io r s being ra ted .

Tab le V also demons t r a t e s in teres t ing convergen t and d ivergent va l id i ty i n f o r m a t i o n in the cor re la t ions be tween the ind iv idua l a t t r ibu tes o f the P B R S and the two scales. Each o f the a t t r ibu tes dea l ing with charac ter i s t ics assoc ia ted with hyperac t ive ch i ld ren cor re la te .60 or be t te r wi th the two scales. The anchors on the distractibility a t t r ibu te are re la ted to a t t en t ion span , those on difficulty following directions are re la ted to impuls iv i ty and d is t rac t ib i l i ty , those on immaturity to p o o r social skills and se l f -cont ro l , those on the fights and quarrels a t t r ibu te to impuls iv i ty and p o o r social skil ls and aggress ion, those on the a t t r i bu t e dangerous to self and others to impuls iveness and h igh act ivi ty level, and those on no enthusiasm for school to lack o f sus ta ined act iv i ty on schoo l - re l a t ed work . O n the o ther hand , the a t t r ibu te needs to be coaxed to work or play, which con ta ins anchors descr ipt ive o f shy, w i t h d r a w n chi ldren , and the a t t r ibu te sick or upset have no t been assoc ia ted with hype rac t ive ch i ld ren and have lower cor re la t ions wi th the two scales.

Classificatory Efficiency

If the mean of physician-identified nonmedicated hyperactive children is used as a cutoff for the designation of hyperactivity (and this method is used in research on the revalence of hyperactivity), we may assess how likely each scale is to identify a child as hyperactive. In our project, the means for the physician-diagnosed hyperactive children, rated off medication, were 68.6 for the BTS, 95.2 for the SBS, and 24.6 for the Conners. Returning to Table II, we see these means fall into intervals 5 and 8, respectively. In spite of the skewed distribution, 24% of the children fall at or above the mean interval of the hyperactive children on the BTS (to the right of the dotted line), whereas only 13% fall at or above the mean interval of the hyperactive children on the SBS (below the dotted line). Examinination of the total score of the project children who were consensually agreed to be hyperactive and of the control children shows that 92.3 of those failing above 1 SD above the mean on the SBS (an arbitrary cutoff) were hyperactive and 7.7% were controls. On the BTS, 83.6% of those scoring above the cutoff were hyperactive and 16.4 were controls. These proportions are significantly different (~2 = 51.61, p > .001). Use of the BTS would lead a practitioner to identify more children as potentially hyperactive than would use of the SBS; using the SBS, there would be fewer false-positive identifications.

This study is only suggestive in judging the effect of using both positive and negatively worded items because there are content differences and category label differences confounded with the use of positively worded items on the SBS. Nevertheless, the changes made in constructing the SBS did have a desirable effect. Format can make raters more sensitive to individual differences. Because the validity of teacher ratings of hyperactivity can be improved, a new generation of rating procedures is required. One model for future rating scales is the SBS. A great deal of research on hyperactive children has been done with rating scales that are not as precise as they should be. Researchers need to attend to the types of rating scales they are using to collect information from teachers.

REFERENCES

Bell, R. Q., Waldrop, M. F., & Weller, G. M. A rating system for the assessment of hyperactive and withdrawn children in preschool samples. American Journal of Orthopsychiatry, 1972, 42, 23-24.

Campbell, J. P., Dunnette, M. D., Arvey, R. D., & Hellervik, L. W. The development and evaluation of behaviorally based rating scales. Journal of Applied Psychology, 1973, 57, 15-22.

Clements, S. D., & Peters, J. E. Minimal brain dysfunctions in the school-age child. Archives of General Psychiatry, 1962, 6, 185-197.

218 Sandovai

Conners, C. K. A teacher rating scale for use in drug studies with children. American Journal of Psychiatry, 1969, 126, 884-888.

Davids, A. An objective instrument for assessing hyperkinesis in children. Journal of Learning Disabilities, 1971, 4, 499-501.

Greenberg, L. M., Deem, M. A., & McMahon, S. Effects of dextroamphetamine, chlor- promazine and hydroxyzine on behavior and performance in hyperactive children. American Journal of Psychiatry, 1972, 129, 532-539.

Kogan, N. Attitudes toward old people. In D. N. Jackson & S. Messick (Eds.), Problems in human assessment. New York: McGraw-Hill, 1967.

Lambert, N. M., & Hartsough, C. S. Scaling behavioral attributes of children using multiple teacher iudgments of pupil characteristics. Educational and Psychological Measurement, 1973, 33, 859-874.

Lambert, N. M., & Hartsough, C. S. Pupil behavior rating scaie. In N. M. Lambert, E. M. Bower, & C. S. Hartsough (Eds.), A process for the assessment of effective student functioning. Monterey, California: Publisher's Test Service, CTB/McGraw-HiI1, 1979.

Lambert, N. M., & Sandoval, J. The prevalence of learning disabilities in a sample of children considered hyperactive. Journal of Abnormal Child Psychology, 1980, 8, 33-50.

Laufer, M. W., & Denhoff, E. Hyperkinetic behavior syndrome in children. Journal of Pediatrics, 1957, 50, 463-474.

Marascuilo, L. A., & McSweeney, M. Nonparametric and distribution free methods for the social sciences. Monterey, California: Brooks/Cole, 1977.

Sandoval, J. The measurement of the hyperactive syndrome in children. Review of Educational Research, 1977, 47, 292-318.

Sandoval, J., Lambert, N. M., & Sassone, D. M. The identification and labeling of hyperactivity in children: An interactive model In C. K. Whalen & B. Henker (Eds.), Hyperac- tive children: The social ecology of identification and treatment. New York: Academic Press, 1980.

Selltiz, C., Wrightsrnan, L. S., & Cook, S. W. Research methods in social relations (3rd ed.). New York: Holt, Rinehart & Winston, 1976.

Smith, P. C., & Kendall, L. M. Retranslation of expectations: An approach to the construc- tion of unambiguous anchors for rating scales. Journal of Applied Psychology, 1963, 47, 149-155.

Sprague, R. L., & Sleator, E. K. Effects of psychopharmacologic agents on learning disorders. Pediatric Clinics o f North America, 1973, 20, 719-735.

Stewart, M. A., Pitts, F. N., Craig, A. G., & Dieruf, W. The hyperactive child syndrome. American Journal of Orthopsychiatry, 1966, 36, 861-867.

Format effects in two teacher rating scales of hyperactivity

Documents

RATING SCALES By M. MUTHU SASI REKHA. SYNOPSIS Introduction Definitions Rating Principles of defining rating scale Types of rating scales

Cognitive Load of Rating Scales - Macalester

The Origin and Development of Rating Scales

INTRO TO RATING SCALES 2014 v1.0. Define Decision: Build Rating Scales 2 Identify Alternatives Identify Criteria Identify Participants Build Ratings Scales

Behavior Rating Scales

Diagnostic Accuracy of Rating Scales for Attention-Deficit ...pediatrics.aappublications.org/content/pediatrics/early/2016/02/26/... · Diagnostic Accuracy of Rating Scales for Attention-Deficit/Hyperactivity

46133605 Psychiatric Rating Scales

Rating Scales - West Sussex County Council€¦ · Web viewRating Scales Keywords rating scales Last modified by Paul Beattie Company WSCC

Rating Scales 2003

Chapter Ten. Lecture Plan Noncomparative Scaling techniques Continuous Rating Scales Itemized Rating Scales Itemized Rating Scales Decisions

Week 7 Rubrics And Rating Scales

Rating Scales (BRM)

14 Attitude and Rating Scales by Sommer.pdf

Michigan Severity Rating Scales:

Psychiatric Rating Scales

Rating Scales in Schizophrenia

Developing Behavior-Based Rating Scales for Performance

Verbal qualifiers for rating scales: - RohrmannResearch

Guide to Assessment Scales in Attention-Deficit Hyperactivity Disorder

Rating Scales Psychiatry