Upload
nwea
View
2.027
Download
0
Embed Size (px)
DESCRIPTION
RIT 101 Gage Kingsbury & Steve Wise, Senior Resource Fellows, NWEA Fusion 2012, the NWEA summer conference in Portland, Oregon It’s easy to say that the RIT scale is an equal-interval scale, but not as easy to back it up. This session will provide a conceptual review of the RIT scale and its characteristics and help to answer these questions: What is a RIT? What is a Rasch model? Why isn’t the number of correct answers used as the score? How are scores compared if students take different test items? Does a 200 RIT score from a third-grader mean the same thing as a 200 from an eighth grader? Learning outcome: - Gain a deeper understanding of the Rasch model. Audience: - New data user - Experienced data user - Advanced data user - District leadership - Curriculum and Instruction
Citation preview
Steven L. Wise
Senior Research Fellow
RIT 101: Understanding
Scores from MAP
• Unique features of the RIT scales
• Calibrating items for MAP
• The RIT scale and adaptive testing
• Scoring a test
• Interpretation of scores
RIT 101
2
• Equal Interval
• Cross Graded
• Stable over time
• Allows us to assess change (growth) over time
• Allows us to develop item banks
• Allows us to give tests specific to student needs
Unique Features of RIT Scales
3
• The RIT scale is the platform upon which both new items are calibrated, a test is chosen for a student and a student’s score is computed and interpreted.
• MAP is a computerized adaptive test (CAT), which means that each student receives a test that is tailored to his/her level of proficiency.
How do we use the RIT scale?
4
• Item calibration is the process by which we figure out how difficult an item is.
• This is extremely useful in both building an item bank and administering a CAT
• Based on item response theory—specifically, the Rasch model.
– Specifies the relationship between a student’s proficiency level and his/her chances of passing the item.
Item Calibration
5
• Some items are more difficult than others.
• We figure out an item’s difficulty by field testing it during live test events.
• We then consider how many students got the item right relative to their standing on the RIT scale.
How do we decide a new item’s difficulty?
6
A Basic Math Item: 5 + 5 = ?
7
0.00.10.20.30.40.50.60.70.80.91.0
120 170 220 270
RIT
Pro
po
rtio
nC
orr
ec
t
Fitting a Rasch Curve
8
0.00.10.20.30.40.50.60.70.80.91.0
120 170 220 270
RIT
Pro
po
rtio
nC
orr
ec
t
Item Difficulty: the RIT value at which we expect half of the students to pass the item.
9
0.0
0.10.2
0.3
0.40.5
0.6
0.7
0.80.9
1.0
120 170 220 270
RIT
Pro
po
rtio
nC
orr
ect
Difficulty = 170
• Once an item has been calibrated, it (along with its difficulty) will be added to the MAP item bank.
• MAP banks contain thousands of test items.
• Large item banks are essential for using CAT.
The Item Bank
10
• The scoring of a student’s test under the Raschmodel takes into account two things:
– how difficult the items were the student received
– how she did on those items
• A standard method of scoring is called “maximum likelihood”
– This just means, “What is the most likely RIT score for a student who performed as she did on the items she received?”
• Conceptually, this is not as complicated as it sounds.
Scoring a Test
11
A One-item Test
12
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
130 140 150 160 170 180 190 200 210 220 230 240 250 260 270
Pro
po
rtio
n C
orr
ect
RIT
If this item was passed, what are the most likely values of the student’s RIT?
What are the least likely values?
A Two-item Test
13
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
130 140 150 160 170 180 190 200 210 220 230 240 250 260 270
Pro
po
rtio
n C
orr
ect
RIT
What if the Blue item was passed and the Red Item was failed?
A Three-item Test
14
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
130 140 150 160 170 180 190 200 210 220 230 240 250 260 270
Pro
po
rtio
n C
orr
ect
RIT
What if the Blue and Green items were passed and the Red Item was failed?
• Notice that item difficulty and student scores are on the same scale (RIT).
• The best measurement occurs when students are given items whose difficulties are well matched to their proficiency levels.
• This is what a CAT does. It tailors the test to each student by adjusting item difficulty.
• Result: all students can be measured with equal precision.
Maximum Likelihood Scoring and CAT
15
1. Pick an item of appropriate starting difficulty.
2. The item is presented & answered by the student.
3. If answer is right, choose a harder item to give next. If answer is wrong, choose an easier item to give next.
4. Repeat steps 2 & 3 until enough items have been given.
5. Calculate the student’s RIT score.
How a CAT Works
16
• How is a student’s RIT score interpreted?
• A RIT score in math of, say 221, by itself is not interpretable.
• We need to have one or more reference points to interpret a score.
Interpreting a RIT Score
17
• Normative: Shauna’s 221 is at the 62nd percentile relative to other 5th grade students.
• Growth: She has gained 13 RIT points since fall MAP testing. Typical growth for students starting at the same level was 9 points.
• Predictive: Her score indicates that she is on track to being college ready by the 12th grade.
• Content: DesCartes provides information about which skills Shauna is currently ready to learn.
Reference Points for a Spring RIT Score of 221 in Math
18
Thank you for your attention.