Composite Computation

Cant Decide Which IQ Is Best? Make a Composite Score W. Joel Schneider Illinois State University A man facing the death penalty has been given two different IQ tests. One of the scores is 76 and the other is 66. One qualifies him for execution, the other does not. You have been asked to give the man a tie-breaker IQ test to decide what his real IQ is. How are you going to answer this question? Prevalance of IQ Differences First, it is reasonable to ask how unusual a 10-point difference is. If the correlation is high, large differences are less common. An inspection of the table below suggests that a 10point difference or greater is not rare, even in highly correlated tests. The two IQ tests this man was given correlate at 0.70. Therefore, a 10-point difference or larger occurs in about 39% of people. Prevalence of IQ score differences, assuming perfect normality and comparable normsIQ test correlations Difference 5 10 15 20 25 0.60 0.71 0.46 0.26 0.14 0.06 0 .70 0.67 0.39 0.20 0.09 0.03 0 .80 0.60 0.29 0.11 0.03 0.01 0 .90 0.46 0.14 0.03 0.003 0.0002

Note: Prevalence can be calculated with the Excel formula =2*NORMDIST(-1*d,0,s*SQRT(22*r),TRUE) where d = Difference, s = standard deviation of both tests, r = correlation between tests.

Psychometric True Scores When you were asked to decide what the mans real IQ is, you knew, of course, that there really is no such thing. However, there is such a thing in psychometrics as a true score and it is easy to confuse the two concepts. The psychometric true score is not the Aristotelian true score (i.e., the truth, independent of measurement). The psychometric true score is the probability-weighted average of all possible measurements under reasonable conditions. That is, imagine all of the different circumstances in which a person might take an IQ test. The date and time will be different. The persons mood will be different. The examiner will be different. Sometimes the test will be given under optimal conditions (e.g., in a quiet, comfortable room) whereas sometimes there are minor distractions of all sorts (if there are major distractions, any trained examiner would discontinue testing). If all of the scores obtained from every possible (and reasonable) combination of circumstances could be averaged, this would be the psychometric true score for a particular IQ test. Of course, some measurement scenarios are more likely than others. Thus, the average of the scores is weighted by the probability of the measurement scenario. This weighting by probability allows highly improbable scenarios to be included in the concept of the true score but it puts reasonable limits on their influence on the true score. The psychometric true score for any individual will never be known with perfect certainty. However, what we would like to be able to do is specify a range in which the psychometric true score is likely to be. For example, we might be able to say that there is a 95% chance that a persons true IQ score (psychometric true score, not Aristotelian true score) lies between 66 and 73.

Intuitively Appealing Dead Ends Suppose that you give a different IQ test to the man and this time the score is 74. What is your interpretation? One way to think about this is that there are two tests higher than 70 and only one below 70. Thus, you might reasonably guess that the true score is greater than 70. However, this procedure uses an election metaphor in which each test gets a vote about whether the person is above or below a certain threshold and the majority opinion wins. Although this method is intuitively appealing, there are more precise methods available. Another intuitively appealing procedure is to average the three tests to yield an IQ score of 72. This procedure uses a different metaphor: what I call the Mega-IQ test. That is, each IQ score is treated as a subtest is in a larger IQ test. This is a good metaphor, but if we are to use it, a different calculation procedure is needed. Instead of averaging, a true composite score should be created. The problem with simple averages is that the averaged score does not have the same standard deviation as the scores being averaged. Thus, the averaged score is not really on the same scale as the scores being averaged. Properly adjusted so that the composite score has a mean of 100 and a standard deviation of 15, the composite score is more extreme than the averaged score. That is, when the average of the test scores is less than 100, the composite score is lower than the averaged score. When the average of the test scores is higher than 100, the composite score is higher than the averaged score. To see this, imagine that a person scored a 75 on two different IQ tests. The average of 75 and 75 is 75. About 5% of the population score scores 75 or lower on the first test and about 5% score 75 or lower on the second test. Unless the tests are perfectly correlated (and they never are), a smaller group of people score an average of 75 or lower on both tests. Thus, an averaged score of 75 represents a more extreme score than a

75 on either test. Why is this so? When a smaller percentage of people obtain a score, it represents a lower score. Thus, an adjustment needs to be made to reflect the more extreme nature of averaged scores so that they are on the same scale (i.e., have the same mean and standard deviation) as the scores being combined. Calculating True Composite Scores Calculating composite scores is not nearly as complicated as it might seem at first glance. The composite score can be calculated with the correlation matrix of all the scores included in the composite. To calculate the confidence interval, you must also know the reliability coefficients of the scores included in the composite score. Note that the accuracy of all calculations depends on the accuracy of the scores, the correlations, and reliability coefficients. In addition, the norms of the tests need to be comparable. For example, if one IQ tests norms are out of date and the mean no longer refers to the current population mean, then the validity of the composite score is suspect. Of course, in such a case, the validity of the out-of-date IQ is suspect as well. All scores must be index scores (Mean = 100, Standard Deviation = 15). If they are not, they can be converted using this formula: Computation Steps Step 1: Add up all of the scores. In this case, 66 + 76 + 74 = 216 Step 2: Subtract the number of tests times 100. In this case there are 3 tests. Thus,

216 3 * 100 = 216 300 = -84 Step 3: Divide by the square root of the sum of all the elements in the correlation matrix. In this case, suppose that the three tests are correlated as such: Test A 1 0.70 0.75 Test B 0.70 1 0.80 Test C 0.75 0.80 1

Test A Test B Test C

The sum of all 9 elements of the correlation matrix, including the ones in the diagonal is 7.5. The square root of 7.5 is about 2.7486. Thus, -84 / 2.7486 = -30.6725 Step 4: Complete the computation of the composite score by adding 100. In this case, -30.6725 + 100 = 69.3275 Given the three IQ scores available, assuming that there is no reason to favor one above the others, the best estimate is that his IQ is 69. Most of the time, there is no need for further calculation. However, we might like to know how precise this estimate is by constructing a 95% confidence interval around this score. Confidence Intervals of Composite Scores Calculating a 95% confidence interval is more complicated than the calculations above but not overly so. To calculate the confidence interval, you must also know the reliability coefficients of the scores included in the composite score. Step 1: Calculate the composite reliability. Step 1a: Subtract the number of tests from the sum of the correlation matrix. In this case, there are 3 tests. Therefore,

7.5 3 = 4.5 Step 1b: Add in all the test reliability coefficients. In this case, suppose that the three reliability coefficients are 0.97, 0.96, and 0.98. Therefore, 4.5 + 0.97 + 0.96 + 0.98 = 7.41. Step 1c: Divide by the original sum of the correlation matrix. In this case, 7.41 / 7.5 = 0.988 Therefore, in this case, the reliability coefficient of the composite score is higher than that of any single IQ score. This makes sense, given that we have three scores, we should know what his IQ is with greater precision than we would if we only had one score. Step 2: Calculate the standard error of measurement by subtracting the reliability coefficient from one and taking the square root. Then, multiply by the standard deviation, 15. In this case, 15 1 - 0.988 = 1.6432 Step 3: Calculate the 95% margin of error by multiplying the standard error of measurement by 1.96. In this case, 1.96 * 1.6432 = 3.2207 The value 1.96 is the approximate z-score associated with the 95% confidence interval. If you want the z-score associated with a different margin of error, then use the following Excel formula. Shown here is the calculation of the z-score for a 99% confidence interval: =NORMSINV(1-(1-0.99)/2)

Step 4: Calculate the estimated true score by subtracting 100 from the composite score, multiplying by the square root of the reliability coefficient, and adding 100. That is, In this case,

Step 5: Calculate the upper and lower bounds of the 95% confidence interval by starting with the estimated true score and then adding and subtracting the margin of error. In this case, 69.5121 3.2207 66 to 73

This means that we are 95% sure that her IQ is between about 66 and 73. Assuming that other criteria for mental retardation are met, this is in the range to qualify for services in most states. It is interesting to note that the composite score formed by the first two IQ scores was also 69 but the 95% confidence interval of 65 to 73. Thus, all the extra effort of giving another test resulted in the same estimated IQ and a confidence interval that was only somewhat narrower. Of course, the additional test might have yielded a much different score and the interpretation would have been different. It should be noted that this procedure can be used for any kind of composite score, not just for IQ tests. To put the all the steps into compact formulas, imagine that the composite IQ is made up of k IQ scores, X1, X2, X3,.Xk. Let rij be the correlation between tests i and j (rij = 1 when i = j).

Let ri be the reliability coefficient for test i.

Let X be the composite IQ score and let r be the composite reliability coefficient. Technical Appendix The formula of the confidence interval of true scores is different from how it appears in many other discussions of confidence intervals (e.g., Ley, 1973). Equation in this paper: Typical equation:

It might seem that one of these two versions is incorrect. However, there is no error in either formula. Which version used depends on how the true scores are scaled relative to the observed scales. Below is the regression equation relating observed scores to true scores plus or minus the standard error of the estimate. If , and Substituting and into Equation 1: , , and , then it follows that: (Equation 1)

However, if the observed score, true scores, and error scores are defined to be all on the same scale What are the benefits of this approach? Using the typical approach, a true score of 70 does not have the same meaning as an observed score of 70 (i.e., 2 SDs below the mean). Rather, a true score of 70 is more extreme: standard deviations below the mean. This becomes and , Equation 1 becomes:

rather confusing if we attempt to compare true scores from different tests with differing levels of reliability. The true scores from the various tests would have different standard deviations and would thus not be directly comparable.To make the true scores and observed scores easy to compare, they are placed on the same metric as the observed scores (converting to z-scores is also a good option). A short discussion of these matters can be found in Section 11 of this tutorial by Florio & Ley: http://www.psychassessment.com.au/PDF/mod4.pdf It is also important to recognize that the confidence intervals in this paper are centered around the estimated true scores. A conceptually different kind of confidence interval is centered around the observed score: It is important to distinguish between these two types of confidence intervals because they produce markedly different confidence intervals when reliability is low and the observed scores are extreme. A detailed discussion about how they differ can be found in Charter and Feldt (2001). In short, the confidence interval centered around observed scores will contain the

true score the correct percent of the time when we consider a group of people who happen to have the same true score (or when the same person with a fixed true score is measured repeatedly). The confidence interval centered around the estimated true score will contain the true score the correct percent of the time when we consider a group of people who happen to have the same observed score. In general, when you wish to say, There is a 95% chance that this persons true score is between X and Y, you should use the confidence interval centered around the estimated true score. References Charter, R. A. and Feldt, L. S. (2001) Confidence intervals for true scores: Is there a correct approach? Journal of Psychoeducational Assessment, 19, 350364. Ley, P. (1972). Quantitative aspects of psychological assessment. London: Duckworth.

Documents

Composite Computation