Chapter 5

1

Chapter 5

Statistical Concepts:Creating New Scores to

Interpret Test Data

2

Norm Referencing vs. Criterion Referencing

Norm-referenced vs. Criterion Referenced Testing Norm-referenced – scores are

compared to a set of test scores called the peer or norm group.

Criterion-referenced – scores are compared to predetermined standard (criterion)

See Taber 5.1 p. 85

3

Normative Comparisons and Derived Scores

Types of Derived scores:

1. percentiles

2. standard scores

3. developmental norms

4

Percentiles

The percentage of people falling below the obtained score and ranges from 1 to 99, with 50 being the mean

Do not to confuse percentile scores with the term “percentage correct”

(see Figure 5.1)

5

Standard Scores

z-scoresT-scoresDeviation IQs, staninessten scores National Curve Equivalents (NCE)college and graduate school entrance exam scores (e.g., SATs, GREs, and ACTs)publisher type scores

6

Standard Scores (Cont’d)

z-scores A standard score that helps us

understand where an individual falls on the normal curve

Practically speaking, z-scores run from -4.0 to plus 4.0

To find a z-score: (X – M)/SD See Figure 5.2, p. 88

7


“z-scores are golden” (Rule #3)

z-scores help us see where an individual’s raw score falls on a normal curve and are helpful for converting a raw score to other kinds of derived scores. That is why we like to keep in mind that z-scores are golden and can often be used to help us understand the meaning of scores.

8


Converting a z-score to a percentile

Find your z-score, then either approximate percentile or look at Appendix E for conversion of z-scores to percentiles.

See Fig. 5.2, p. 88

9


Converting to Standard Scores

1. Get your z-score

2. Plug your z-score into the following conversion formula

z-score (SD of desired score) + mean of desired score

10


Converting z-scores to standard scores Use your conversion formula by plugging in

the means and standard deviations for each of the respective standard scores that follow:T-scores (M = 50, SD = 10) used for

personality tests mostly (Fig. 5.3, p. 89)DIQ scores (M = 100, SD = 15). Mostly

used for tests of intelligence. (Fig. 5.4, p. 90)

Stanines (M = 5, SD = 2), round off to nearest whole number). Mostly used for achievement testing. (Fig. 5.5, p. 91)

11


Converting z-scores to standard scores. (Cont’d)Use conversion formula with following: Sten scores (M = 5.5, SD = 2), round off to

nearest whole number). Used with personality inventories and questionnaires (Fig. 5.6, p. 92)

NCEs (M of 50, SD of 21.06). Like percentiles in that they basically range from 1 – 99 but evenly distributed. (Percentiles, bunch up around mean). Usually used for educational tests (Fig. 5.7, p. 93)

12


Converting z-scores to standard scores (Cont’d)

Use conversion formula with following: SATs/GREs (M = 500, SD = 100) ACTs (M = 21, SD = 5) Publisher Type Scores: Mean and

Standard deviations arbitrarily set by publisher

13

Developmental Norms

Age Comparisons When you are being compared to others at your

same age Often done for physical attributes:

My 10 year-old weighs 78 lbs, what percentile is she compared to others her age?

Can use z-scores to determine: E.g., [78 – 80 (mean)]/8 (SD) = -.25 z or a

percentile of about 40 (p = 40) See Box 5.4, p. 95

14

Developmental Norms (Cont’d)

Grade Equivalents (GE): Compares child’s score to his or her grade level E.g., Child in 5.6 grade and gets at mean

compared to peer group. GE = 5.6 GE over 5.6 means that he or she is doing better

than his or her peer group GE below 5.6 means that he or she is doing worse

than his or her peer group Usually, not a statement about how much better

or how much worse (a child who is in 5.6 and gets GE of 7.5 is not necessarily at 7.5 grade level)

15

Standard Error of Measurement

Based on reliability of testTells you how much error there is in the test and ultimately how much any individual’s score might fluctuate due to this errorFormula:Multiply the standard deviation of the score you are using (e.g., for T-score, SD = 10, for DIQ SD = 15) times the square root of 1 minus the reliability of the testSee Box 5.5, p. 99

16

Standard Error of Measurement

For example: If you receive a T-score of 60 on a personality

test and the reliability of the test is .84, what is your SEM?

10(.4) = 4 This means that 68 percent of time your score

will fall between 56 and 64, and 95 percent of time your score will fall between

52 and 68 See Figure 5.8, p. 97

17

Standard Error of Estimate

Gives a confidence interval around a predicted scoreBased on scores received on one variable, you can predict the range of scores you will receive on a second variableE.g., If you know your high school GPA and the correlation between high school GPA and 1st year college grades, you can predict the range of GPA you are likely to get in your 1st year of college.Formula:

See Box 5.6, p. 100

21 rSDSE Yest

18

Rule Number 4: Don’t Mix Apples and Oranges

As you practice various formulas in class, it is easy to use the wrong score, mean, or standard deviation. For instance, in determining the SEM for Latisha (in book), we used Latisha’s DIQ score of 120 and figured out the SEM using the DIQ standard deviation of 15. However, if we had been asked to figure out the SEM of her raw score, we would use her raw score and the standard deviation of raw scores. Whenever you are asked to figure out a problem, remember to use the correct set of numbers (don’t mix apples and oranges), otherwise your answer will be incorrect.

19

Scales of Measurement

In assessment, we measure characteristics in quantifiable terms: but “quantifiable” can be defined differently E.g., gender is measured as either male or female E.g. achievement can be measured on a scale that

has a large range of scoresFour different scales of measurement have been created to help us define “quantifiable”Type of scale you use is intimately related to type of instrument you choose

20

Scales of Measurement (Cont’d)

Four scales:1.Nominal: Numbers arbitrarily assigned to categories:

E.g., Race: 1=Asian, 3=Latino,2=African American 4=Caucasian

2.Ordinal: magnitude or rank order is implied Rate the following: “The counseling was helpful”

1. = Strongly Disagree2. = Somewhat Disagree3. = Neutral4. = Somewhat Agree5. = Strongly Agree.

21

Scales of Measurement (Cont’d)

Four scales (Cont’d)3. Interval: Establishes equal distances between

measurements. No absolute zero. E.g., A 600 on the GRE is better than a 550

but not twice as good as a 300

4. Ratio: Has meaningful zero point and equal intervals. If I weigh 200, I weigh twice as much as

someone who weighs 100

Documents

Chapter 5