6
Relationships Within the Trail Making Test 409 REITAN, R. M. (1979). Manual for administration of neuropsychological test batteries f o r adults and children. REITAN, R. M., & TARSHES, E. L. (1959). Differential effects of lateralized brain lesions on the Trail Making REITAN, R. M., & WOLFSON, D. (1985). The Halstead-Reitan Neuropsychological Test Battery: Theory and RUSSELL, E. W. (1975). A multiple scoring method for the assessment of complex memory functions. Jour- RUSSELL, E. W., NEURINGER, C., & GOLDSTEIN, G. (1970). Assessment of brain damage: A neuropsychological SPREEN, O., & BENTON, A. L. (1965). Comparative studies of some psychological tests for cerebral damage. WECHSLER, D. A. (1945). A standardized memory scale for clinical use. Journal of Psychology, 19, 87-93. WECHSLER, D. A. (1955). WECHSLER, D. A. (1981). Revised Wechsler Adult Intelligence Scale, New York: Psychological Corporation. Unpublished manuscript. Test. Journal of Nervous and Mental Disease, 129, 257-262. clinical interpretation. Tucson, AZ: Neuropsychology Press. nal of Consulting and Clinical Psychology, 43, 800-809. key approach. New York: John Wiley. Journal of Nervous and Mental Disease, 140, 323-333. Wechsler Adult Intelligence Scale. New York: Psychological Corporation. TWO INDICES OF SUBTEST SCATTER ON WECHSLER’S INTELLIGENCE SCALES: ESTIMATED VS. EMPIRICAL VALUES A. B. SILVERSTEIN Mental Retardation Research Center Lanterman Developmental Center Research Group School of Medicine, University of California, Los Angeles Previous research on the scaled-score range and the number of deviant subtest scores on the WISC-R and the WPPSI has produced surprising results: Normal children exhibited much more scatter than most clinicians probably realized. Yet, in principle, these results were predictable from a knowledge of the statistical properties of scaled scores. Procedures for evaluating the abnormality of the two indices of scatter are presented and their accuracy checked against the actual distributions of the indices for the two standardiza- tion samples. The findings suggest that the procedures are sufficiently ac- curate for most clinical applications, which supports their use when the ac- tual distributions of the indices are unknown. Subtest scatter on Wechsler’s scales has long been of interest to clinical practitioners and researchers, but it was only 10 years ago that normative data first became available for two objective indices of scatter: The scaled-score range (the difference between a subject’s highest and lowest subtest scores) and the number of subtest scores that deviate “significantly” from the subject’s own mean. Kaufman (1976) analyzed the standardiza- tion data for the Wechsler Intelligence Scale for Children-Revised (WISC-R; Wechsler, 1974), and both he and many others, myself included, were greatly surprised by the amount of scatter exhibited by normal children. In particular, the average child obtained a scaled-score range of approximately 7 points on the 10 regular subtests, with one or two subtest scores that deviated by 3 or more points from his or her 10-subtest mean. This study was supported in part by National Institute of Child Health and Human Development Research Requests for reprints should be sent to A. B. Silverstein, Mental Retardation Research Center-Lanterman Grant No. HD-04612. Developmental Center Research Group, P.O. Box 100-R, Pomona, California 91769.

Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

Embed Size (px)

Citation preview

Page 1: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

Relationships Within the Trail Making Test 409

REITAN, R. M. (1979). Manual for administration of neuropsychological test batteries for adults and children.

REITAN, R. M., & TARSHES, E. L. (1959). Differential effects of lateralized brain lesions on the Trail Making

REITAN, R. M., & WOLFSON, D. (1985). The Halstead-Reitan Neuropsychological Test Battery: Theory and

RUSSELL, E. W. (1975). A multiple scoring method for the assessment of complex memory functions. Jour-

RUSSELL, E. W., NEURINGER, C., & GOLDSTEIN, G. (1970). Assessment of brain damage: A neuropsychological

SPREEN, O., & BENTON, A. L. (1965). Comparative studies of some psychological tests for cerebral damage.

WECHSLER, D. A. (1945). A standardized memory scale for clinical use. Journal of Psychology, 19, 87-93. WECHSLER, D. A. (1955). WECHSLER, D. A. (1981). Revised Wechsler Adult Intelligence Scale, New York: Psychological Corporation.

Unpublished manuscript.

Test. Journal of Nervous and Mental Disease, 129, 257-262.

clinical interpretation. Tucson, AZ: Neuropsychology Press.

nal of Consulting and Clinical Psychology, 43, 800-809.

key approach. New York: John Wiley.

Journal of Nervous and Mental Disease, 140, 323-333.

Wechsler Adult Intelligence Scale. New York: Psychological Corporation.

TWO INDICES OF SUBTEST SCATTER ON WECHSLER’S INTELLIGENCE SCALES: ESTIMATED VS. EMPIRICAL VALUES

A. B . SILVERSTEIN

Mental Retardation Research Center Lanterman Developmental Center Research Group

School of Medicine, University of California, Los Angeles

Previous research on the scaled-score range and the number of deviant subtest scores on the WISC-R and the WPPSI has produced surprising results: Normal children exhibited much more scatter than most clinicians probably realized. Yet, in principle, these results were predictable from a knowledge of the statistical properties of scaled scores. Procedures for evaluating the abnormality of the two indices of scatter are presented and their accuracy checked against the actual distributions of the indices for the two standardiza- tion samples. The findings suggest that the procedures are sufficiently ac- curate for most clinical applications, which supports their use when the ac- tual distributions of the indices are unknown.

Subtest scatter on Wechsler’s scales has long been of interest to clinical practitioners and researchers, but it was only 10 years ago that normative data first became available for two objective indices of scatter: The scaled-score range (the difference between a subject’s highest and lowest subtest scores) and the number of subtest scores that deviate “significantly” from the subject’s own mean. Kaufman (1976) analyzed the standardiza- tion data for the Wechsler Intelligence Scale for Children-Revised (WISC-R; Wechsler, 1974), and both he and many others, myself included, were greatly surprised by the amount of scatter exhibited by normal children. In particular, the average child obtained a scaled-score range of approximately 7 points on the 10 regular subtests, with one or two subtest scores that deviated by 3 or more points from his or her 10-subtest mean.

This study was supported in part by National Institute of Child Health and Human Development Research

Requests for reprints should be sent to A. B. Silverstein, Mental Retardation Research Center-Lanterman Grant No. HD-04612.

Developmental Center Research Group, P.O. Box 100-R, Pomona, California 91769.

Page 2: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

410 Journal of Clinical Psychology, July 1987, Vol. 43, No. 4

Moreover, these values were virtually constant, regardless of the child’s age, sex, race, parental occupation, or level of intelligence. Subsequently, Reynolds and Gutkin (198 1) analyzed the standardization data for the Wechsler Preschool and Primary Scale of In- telligence (WPPSI; Wechsler, 1967) in much the same way, with very similar results.

In this instance, as in most others, hindsight is vastly superior to foresight, and the purpose of the present paper is to demonstrate that, in principle, the surprising results of previous research were actually predictable from a knowledge of the statistical pro- perties of scaled scores. I also have estimated scaled-score ranges for the Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 198 1) because the standardization data for that scale have not been analyzed in the same way as they have for the other two scales. The procedures employed are completely generalizable to any scale with subtests that have normally distributed scores with a common standard deviation and require only that the value of that standard deviation, and the value of the average intercor- relation among the subtests, be known.

SCALED-SCORE RANGE The distribution of the range in samples from a normal population first was studied

more than 60 years ago, and there are tables of critical values at various probability levels, and also of the moments (e.g., the mean and variance). If x1 and xn are the lowest and highest values, respectively, in a sample of n independent observations, the range of the sample in standardized form is W = (xn - XI)/U, where u is the standard devia- tion of X . The case in which the observations are correlated can be reduced to one of independence by a trick that leaves the numerator of W unchanged, but multiplies the denominator by Jl - 4 to correct for the correlation (Miller, 1981, p. 46).

For the subtests on Wechsler’s scales, u = 3 and e may be taken as the average intercorrelation among the subtests that make up a particular scale. Thus, the scaled- score range obtained by the extreme 10OPVo of the standardization sample can be estimated by multiplying the critical value at the 1 - P/2 probability level (Owen, 1962, Table 6.1) by 3 JE. The mean and standard deviation of the scaled-score range can be estimated by multiplying E(W) and Ju2(W) (Owen, 1962, Table 6.2), respec- tively, by the same value.

I have applied this quite simple procedure to the standardization data for the WISC-R and the WPPSI, as summarized in the respective manuals (Wechsler, 1974, Table 15; Wechsler, 1967, Table 17). Note that access to the subtest scores of the in- dividual children who made up the two standardization samples was not required. The resulting estimates for the WISC-R are given in Table 1 , together with the empirical values taken from Kaufman’s (1976) analysis. For example, on the five regular subtests of the Verbal Scale, I estimated that less than 10, 5 , 2, and 1% of the children in the standardization sample obtained scaled score ranges equal to or greater than 8, 9, 10, and 10 points, respectively; Kaufman’s empirical values are 8, 9, 10, and 1 1 . For the same five subtests, I estimated a mean scaled-score range of 4.5 and a standard devia- tion of 1.7; the empirical values are 4.5 and 1.9. Overall, 1 1 of the 24 estimates agree perfectly with the empirical values, and none of the others is off (underestimated in every case) by more than 1 point. The means and standard deviations are also estimated quite accurately.

The estimates for the WPPSI are given in Table 2, together with the empirical values taken from Reynolds and Gutkin’s (1981) analysis. This table is read in the same way as Table 1 . Here, half of the 20 estimates agree perfectly with the empirical values, and again none of the others is off (overestimated in every case) by more than 1 point. Once more, the means and standard deviations are also estimated quite accurately.

Page 3: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

Subtest Scatter on Wechsler’s Scales 41 1

Table 1 Scaled-score Ranges Obtained by Various Percentages of the WISC-R Standardization Sample: Estimated vs. Empirical Values

Verbal Scale Performance Scale Full Scale 5 subtests 6 subtests 5 subtests 6 subtests 10 subtests 12 subtests

Extreme To of sample Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp.

< 10

< 5

< 2

< 1

M

SD

~~ ~

8 8 9 9 10 10 10 10 11 11 11 12 9 9 10 10 10 11 11 I 1 1 1 12 12 13

10 10 11 11 11 12 12 13 12 13 13 14

10 11 11 12 12 13 12 13 13 14 14 14

4.5 4.5 5.4 5.3 5.5 5.5 6.0 6.1 7.0 7.0 7.6 7.7

1.7 1.9 1.8 2.0 2.0 2.3 2.0 2.3 1.8 2.1 1.8 2.1 ~

Note. -Empirical values from Kaufman (1976).

Table 2 Scaled-score Ranges Obtained by Various Percentages of the WPPSI Standardization Sample: Estimated vs. Empirical Values

Verbal Scale Performance Full Scale Scale

5 subtests 6 subtests 5 subtests 10 subtests 11 subtests Extreme To of sample Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp.

< 10

<5

< 2

< I M SD

11 10 8 7 9 8 9 8 11 10

9 8 9 9 10 9 11 11 12 11

10 10 10 10 11 11 12 12 12 12

10 10 11 10 12 11 13 13 13 13

4.7 4.7 5.2 5.2 5.3 5.2 7.0 6.9 7.2 7.0

1.8 2.0 1.7 1.9 2.0 2.1 1.8 2.1 1.8 2.1

Note. -Empirical values from Reynolds and Gutkin (1981).

Table 3 Scaled-score Ranges Obtained by Various Percentages of the WAIS-R Standardization Sample: Estimated Values

Extreme To of sample Verbal Scale Performance Scale Full Scale

6 subtests 5 subtests 11 subtests

< 10

< 5

< 2 < I M SD

8

9

9

10

4.8

1.6

9

10

11

I 1 5.1

1.9

10

11

12

12

6.7

1.7

Note.-Tabled values refer to “age scaled” scores (Wechsler, 1981, Table 21), not to the scaled scores based on the reference group, ages 20-34, that are used for determining a subject’s IQ.

Page 4: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

412 Journal of Clinical Psychology, July 1987, Vol. 43, No. 4

The accuracy of the estimates for the WISC-R and the WPPSI justifies the applica- tion of the same procedure to the standardization data for the WAIS-R, as summarized in the manual (Wechsler, 1981, Table 16). The resulting estimates, which are generally comparable to those for the other two scales, are given in Table 3. On the basis of the preceding results, it appears that none of the estimates is likely to be off (either underestimated or overestimated) by more than 1 point.

NUMBER OF DEVIANT SUBTEST SCORES Determining the appropriate distribution for the number of deviant subtest scores

presented more of a challenge, although it proved to be one that has been known for 150 years. Its identity was suggested by a curious feature of the results reported by Kauf- man (1976) and Reynolds and Gutkin (1981): In every instance, for both the WISC-R and the WPPSI, the standard deviation is very nearly equal to the square root of the corresponding mean, a property of the Poisson distribution (for which the mean and variance are equal). Consequently, the problem reduced to determining a value for the mean, without relying on the previously reported empirical values.

The mean of a Poisson distribution is m = np, and here n is the number of subtests that make up a particular scale. Thus, the problem reduced further to determining a value for p , the probability of obtaining a deviant score on one of the subtests, but that is the proportion of subjects in the standardization sample who obtained a deviant score on any subtest. Starting from formulae that I have presented previously (Silverstein, 1984), it can be shown that if X is a mean subtest score, and Xj is one of the n subtest scores that entered into that mean, the required value of p is the two-tail probability associated with the standard normal deviate, Z = (xj -JT)/u J ( n - 1) (1 - e ) /n . Here, as before, u is the standard deviation of X (3 for the subtests on Wechsler's scales), and e is the average intercorrelation among the subtests.

There are extensive tables of the Poisson distribution, but I chose to perform the necessary computations on a programmable desk calculator.' Then it was a simple matter to obtain the estimates for the WISC-R and the WPPSI that are given in Tables 4 and 5 , respectively, together with the empirical values taken from the analyses by Kaufman (1976) and Reynolds and Gutkin (1981). These tables are read in the same way as the previous tables. Kaufman defined a deviant subtest score as one that differed by 3 or more points from the subject's own mean, whereas Reynolds and Gutkin required a difference of only 2.5 or more points. No matter; 21 of the 24 estimates for the WISC-R, and 17 of the 20 estimates for the WPPSI, agree perfectly with the empirical values, and none of the others is off by more than 1 point. The means and standard deviations also are estimated very accurately.

I have not applied this procedure to the standardization data for the WAIS-R because of uncertainty as to how to define a deviant subtest score. The (different) defini- tions employed by Kaufman (1976) and Reynolds and Gutkin (1981) were based on the premise that, on the average, differences of about 3 points on the WISC-R and 2.5 points on the WPPSI are significant at the .05 level, but these values do not take into account the fact that multiple comparisons are involved (Silverstein, 1982). If deviance is t o be defined in terms of significance, more suitable values would be 4 points for the WISC-R and 3.5 points for both the WPPSI and the WAIS-R, but I see no compelling reason for defining deviance in those terms.

'That was not as difficult as it may seem. The first term in a Poisson distribution is l/em, and each term thereafter can be computed by multiplying the preceding term by m/a, where a takes on the values of the successive integers, 1, 2, 3 . . . . The calculator also cumulated the terms as they were computed.

Page 5: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

Subtest Scatter on Wechsler’s Scales 413

Table 4 Numbers of Deviant Subtest Scores Obtained by Various Percentages of the WISC-R Standard- ization Sample: Estimated vs. Empirical Values

Verbal Scale Performance Scale Full Scale 5 subtests 6 subtests 5 subtests 6 subtests 10 subtests 12 subtests

Extreme % of sample Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp.

< 10 2 3 3 3 3 3 3 3 4 4 5 5

< 5 3 3 3 3 3 3 4 4 5 5 6 6

< 2 4 4 4 4 4 4 5 5 6 6 7 7

< I 4 4 4 4 4 5 5 5 6 7 7 1

M .5 .5 .7 .8 .8 .8 1.0 1 . 1 1.6 1.7 2.2 2 .3 SD .7 .7 .8 .9 .9 .9 1.0 1.0 1.3 1 .3 1.5 1.5

Note.-Empirical values from Kaufman (1976), who defined a deviant subtest score as one that differed by 3 or more points from the subject’s own mean.

Table 5 Numbers of Deviant Subtest Scores Obtained by Various Percentages of the WPPSI Standardiza- tion Sample: Estimated vs. Empirical Values

Verbal Scale Performance Full Scale Scale

5 subtests 6 subtests 5 subtests 10 subtests 1 1 subtests

Extreme To of sample Est. Emp. Est. Emp. Est. Emp. Est. Emp. Est. Emp.

6 6 6 6 < 10 3 3 3 4 3 3 < 5 4 4 4 4 4 4 6 6 7 1

1 2 4 4 5 5 5 5 7 7 8 7

8 8 < 1 5 4 5 5 5 5 8 8

M .9 .9 1 . 1 1.2 1 . 1 1 . 1 2.5 2.5 2.8 2.7

SD .9 .9 1.0 1 . 1 1.0 1.0 1.6 1.5 1.7 1.6

Note.-Empirical values from Reynolds and Gutkin (1981), who defined a deviant subtest score as one that differed by 2.5 or more points from the subject’s own mean.

CONCLUSION In previous studies (Silverstein, 1981, 1985a, 1985b) I have checked the accuracy

of formulae for evaluating the “abnormality” of the difference between (a) scores on two measures (Payne & Jones, 1957); and (b) a mean score and one of the scores that entered into that mean (Silverstein, 1984). The results suggested that both of those for- mulae are sufficiently accurate for most clinical applications, which supports their use when the actual distributions of the difference are unknown. The present findings sug- gest that the same is true of the procedures, described above, for evaluating the abnor- mality of scaled-score ranges and the number of deviant subtest scores. If and when test authors and publishers routinely present information on the distributions of various indices of scatter, there will of course be no need for such procedures, but that day has not yet come.

Page 6: Two indices of subtest scatter on Wechsler's intelligence scales: Estimated vs. empirical values

414 Journal of Clinical Psychology, July 1987, Vol. 43, No. 4

REFERENCES KAUFMAN, A. S. (1976). A new approach to the interpretation of test scatter on the WISC-R. Journal of

MILLER, R. G., JR. (1981). Simultaneous statistical inference (2nd ed.). New York: Springer-Verlag. OWEN, D. B. (1962). Handbook of statistical tables. Reading, MA: Addison-Wesley. PAYNE, R. W., & JONES, H. G. (1957). Statistics for the investigation of individual cases. Journalof Clinical

REYNOLDS, C. R., & GUTKIN, T. B. (1981). Test scatter on the WPPSI: Normative analyses of the standardiza- tion sample. Journal of Learning Disabilities, 14, 460-463.

SILVERSTEIN, A. B. (1981). Verbal-Performance IQ discrepancies on the WISC-R: One more time. Journal of Consulting and Clinical Psychology, 49, 465-466.

SILVERSTEIN, A. B. (1982). Pattern analysis as simultaneous statistical inference. Journal of Consulting and Clinical Psychology, 50, 234-240.

SILVERSTEIN, A. B. (1984). New formulas for evaluating the abnormality of test score differences. Journal of Psychoeducational Assessment, 2, 79-82.

SILVERSTEIN, A. B. (1985a). Unusual differences between domain standard scores on the Vineland Adaptive Behavior Scales: Estimated versus empirical values. Journal of Psychoeducational Assessment, 3, 291-293.

SILVERSTEIN, A. B. (1985b). Verbal-Performance IQ discrepancies on the WAIS-R: Estimated vs. empirical values. Journal of Clinical Psychology, 41, 694-697.

WECHSLER, D. (1967). Manual fo r the Wechsler Preschool and Primary Scale of Intelligence. New York: Psychological Corporation.

WECHSLER, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. New York: Psychological Corporation.

WESCHLER, D. (1981). WAIS-R manual: Wechsler Adult Intelligence Scale-Revised. New York: Psychological Corporation.

Learning Disabilities, 9, 160-168.

Psychology, 13, 115-121.