Rule-Space Methodology: Constructing More Useful Information from Test Performance Yi-hsin Chen Research Methods in a Nutshell College of Education Presentation

Rule-Space Methodology:Constructing More Useful Information from Test Performance

Yi-hsin ChenResearch Methods in a Nutshell

College of Education PresentationCollege of Education Presentation

January 26January 26thth, 2007, 2007

Research Methods Workshops SeriesResearch Methods Workshops Series

Outline

Why What How When

Overview

What is educational assessment Educational assessment is a process of

collecting evidence and interpreting it to provide instructors with information regarding students’ learning (Glaser, 1962)

Student learning information Instructors can use this information to

identify what knowledge students have, diagnose the learning errors or misconceptions, and detect learning effects and outcomes

Overview

No Child Left Behind Act of 2001 (NCLB) The purpose of NCLB is to deal with the

improvement of academic achievement of disadvantaged students

Standardized educational tests The primary goal of standardized

educational tests is to obtain information of student learning with the ultimate goal of improving it

Overview

Majority of learning behaviors in classrooms center around problem solving or other mental functions

It would be useful to present or link test results in terms of cognitive process explicitly

How are test scores used so far Highly related with the

psychometric models

Traditional Paradigm

Traditional psychometric model: Classic True Score Theory (CTST) and Item Response Theory (IRT)

Single-score-based testing paradigm Test scores do not reflect the

cognitive information innate in test scores

The lack of cognitive information incorporated into traditional psychometric models

Limitations of Test Score

Utility of Test Scores Without cognitive information, the utility of

the tests is limited as a means of diagnostic feedback to teachers in classrooms

As a result, achievement tests have mainly been used for the purposes of selection, placement, and certification

Construct Validity of Tests Without cognitive information, evidence

typically consists of correlations between test scores and other measures

Little information is available that is more directly concerned with the theoretical mechanisms underlying successful test performance

Traditional Paradigm

Conventional skills-level assessment (diagnostic assessment) A list of cognitive Domains (targeted

skills) A subset of items is associated with

each domains (skills) (One item belongs to only one category)

Subscore on each of cognitive domains Conventionally, student’s cognitive

skills-mastery profile is based on subscoing

New Approaches To date, some psychometricians have applied

cognitive psychology principals to psychometric models of educational assessment data for these purposes, called psychometric skills-diagnostic models

Stout (2002) mentioned several milestones in the psychometric history of cognitive modeling Gerhardt Fischer: Linear logistic trait model (LLTM) Susan Embretson: A series of multidimensional

logistic IRT models Edward Haertel: Restricted latent class model Kikumi Tatsuoka: Rule-space methodology Robert Mislevy: Bayes net approach to skills

diagnosis

RSM

Tatsuoka’s rule-space methodology (RSM) is one of these new approaches which can be used to measure students’ knowledge states consisting of master/non-mastered cognitive skills, knowledge, and strategies

Mathematically, RSM is a mathematically probabilistic approach

Methodologically, RSM is a cognitively diagnostic method, which is an approach of pattern classification and statistical decision

RSM

A Student’s Observed Item Response Patterns 1 0 1 1 1 0 1 1 0 1 (10 Items)

A Student’s Unobserved Attribute Mastery Probabilities or Pattern .83 .95 1.00 .75 .34 (5 Attributes) 1 1 1 0 0 (cutoff point of .80)

RSM

Pattern Classification

& Statistical Decision

Pattern Classification

& Statistical Decision

Procedures of RSM

Step1: Identifying a list of cognitive attributes and Q-matrix

Step 2: Determining ideal item-response patterns corresponding to knowledge states

Step 3: Mapping the students’ response patterns and the ideal item-response patterns onto classification space

Step 4: Classifying an student’s responses into one of the closest knowledge states

Two- or Multi-Dimensional

Classification Spaces

Ideal Item Response Patterns

(BDF)

Ideal Item Response Patterns

(BDF)

Students’ Item Response Patterns

Attribute Probabilities

(Mahalanobis Distance (D2) & Bayesian Minimum Error Rule)

A Set of Attributes& Q-Matrix

A Set of Attributes& Q-Matrix IdentificationIdentification

DeterminationDeterminationMappingMapping

ClassificationClassification

Step 1: Identification A list of cognitive attributes and incidence

matrix (Q-matrix) are identified in Step 1 Cognitive attributes for the test may

include knowledge, processes, strategies, and skills, which are required to answer the items correctly

Incidence matrix depicts the relationships between items and attributes

Both are referred to as a cognitive model of the test in the rule-space analyses

Step 1: Identification A list of cognitive attributes and incidence

matrix (Q-matrix) are identified in Step 1 Cognitive attributes for the test may

include knowledge, processes, strategies, and skills, which are required to answer the items correctly

Incidence matrix depicts the relationships between items and attributes

Both are referred to as a cognitive model of the test in the rule-space analyses

Attribute List CONTENT ATTRIBUTES

SKILL/ITEM TYPE ATTRIBUTES

PROCESS ATTRIBUTES

Content Attributes

C1: Basic concepts, properties and operations in whole numbers and integers

C2: Basic concepts, properties, and operations in fractions and decimals

C3: Basic concepts, properties, and operations in elementary algebra

C4: Basic concepts and properties of two-dimensional geometry

C5: Data, probability, and basic statistics

Skill/Item Type Attributes

S2: Applying number properties and relationships; number sense/number line

S3: Using figures, tables, charts and graphs

S4: Approximation/Estimation S5: Evaluate/Verify/Check Options S6: Patterns and relationships (be able to

apply inductive thinking skills) S7: Using proportional reasoning S8: Solving novel or unfamiliar problems S10: Open-ended item, in which an

answer is not given S11: Using words to communicate

questions (word problem)

Process Attributes P1: Translate/formulate equations and

expressions to solve a problem P2: Computational applications of knowledge

in arithmetic and geometry P3: Judgmental applications of knowledge in

arithmetic and geometry P4:Applying rules in algebra P5: Logical reasoning—includes case

reasoning, deductive thinking skills, if-then, necessary and sufficient, generalization skills

P6: Problem Search; Analytic Thinking, Problem Restructuring and Inductive Thinking

P7: Generating, visualizing and reading Figures and Graphs

P9: Management of Data and Procedures P10: Quantitative and Logical Reading

A Coding Example

Question: Mary ran a race in 49.86 seconds. Betty ran the same race in 52.30 seconds. How much longer did it take Betty to run the race than Alice?A. 2.44 seconds B. 2.54 seconds C. 3.56 seconds D. 3.76 seconds

Attributes involvement Content is fraction and decimals----------------------- C2 Dealing with time is very common and routine------

S8 Using words to express a question--------------------

S11 Subtracting 49.86 from 52.30 is straight forward

translation of the expression to arithmetic------------ P1

Q-Matrix

The incidence matrix (Q-Matrix) is a I x A binary indicator matrix for which the rows (I) represent items and the columns (A) represent attributes

Step 2: Determination

A list of Attributes

Possible Knowledge States

Q-Matrix Boolean Descriptive Function

Ideal Item Response Pattern


The goal of the determination step is to determine ideal item-response patterns based on the possible knowledge states and the identified Q-matrix

The knowledge state is defined as the attribute mastery pattern where 1 stands for mastered and 0 for not mastered

Given a three attributes example, there are 8 (23) possible knowledge states, including (000) (100) (010) (001) (110) (101) (011) (111)

Boolean Descriptive Function

A Boolean Descriptive Function (BDF; Tatsuoka, 1991) is applied to connect latent knowledge states with ideal item-response patterns

The basic assumption behind a Boolean Descriptive Function is that an item can be answered correctly if and only if all the attributes involved in this item have been mastered

BDF: An item can be answered correctly if and only if all the attributes involved in the item have been mastered

Ideal Item Response Pattern


Since the knowledge state is unobservable, the observable ideal item-response pattern should be determined

An ideal item-response pattern is the pattern of correct and incorrect responses that an individual demonstrates that are consistent with the attributes an individual has or has not mastered

Ideal item-response patterns can be considered as classification groups in RSM

Step 3: Mapping The third step is mapping examinee item-

response patterns and ideal item-response patterns onto the classification space (θ, ζ)

The rule-space methodology utilizes the Cartesian Coordinate System to formulate an orthogonal two-dimensional classification space

The classification space consists of the latent ability variable in IRT, θ, along the horizontal axis and one of the IRT-based caution indices, ζ, which is the unusualness of item response patterns, along the vertical axis

Caution Indices ζ

Step 4: Classification

In the classification stage, the comparison of the examinee’s item-response pattern to each of all possible ideal item-response patterns in the classification space (θ, ζ) is performed

Mahalanobis distance (Dis2) between the point

associated with the examinee’s item-response pattern in the classification space (θ, ζ) and the point associated with each of the ideal item-response patterns, is applied as an admissibility criterion for this comparison

Limitations of D2

Its use can lead to more than one acceptable ideal item-response pattern for a particular examinee’s item-response pattern

Further, the Mahalanobis distance does not yield the probabilities of misspecifications (or errors) or any other evidence for determining the attribute mastery profile

Bayesian Decision Rule

A Bayesian minimum error rule is applied to yield the posterior probability for the final decision on classification

To classify the examinee’s item-response pattern into only one closest ideal item-response pattern with the highest posterior probability

Purpose of this Study To verify whether previously identified

cognitive attributes represent the performance of Taiwanese eighth graders on the TIMSS-1999 mathematics tests

To examine the knowledge states most populated by the Taiwanese students

To compare group differences in terms of cognitive attributes Performance level Gender Region

Analysis

Verifying the Cognitive Model

To validate both the attributes and Q-matrix

The following things were conducted Computing classification rate Multiple regression analyses Comparing mastery probability of

each attribute across four booklets

Classification Rate The proportion of examinees who

are classified successfully into at least one of the predetermined knowledge groups

If the classification rate is low, this suggests that the ideal item response pattern derived from the cognitive model do not reflect the actual examinee performance on the TIMSS-1999 mathematics test

Multiple Regression Analyses

To regress examinee ability parameter (such as total scores and the first plausible value) on examinee attribute mastery probability

R-square and adjusted R-square indices were checked

To determine how well attribute probabilities account for examinees’ performance (total scores)

Consistency

The means and standard deviations of attribute mastery probabilities were computed for four booklets

The consistency of attribute mastery probabilities across four booklets was checked

Inconsistent attributes reflect a problem concerning attributes and/or attribute coding

Results of Verification

The mean squared Maholonobis distance (D2) from the closest latent knowledge states was .44

Classification rates were extremely high (99.3% to 99.9%), and only 11 out of 2874 students were not assigned to at least one of the predetermined knowledge states

Regression analyses with total scores obtained extremely high R2 and adjusted R2(.943 to .979) for four booklets as well as .925 and .924 for the entire sample

Results of Verification

The ranges of mean attribute probabilities across booklets for 20 attributes were less than .20 and 13 out of 23 attributes had probability difference ranges less than .10

The largest difference in range of mean attribute probabilities across booklets was .27 for Recognize pattern (S6)

Recognize patterns (S6) was required in the fewest total items (14 items) across the four booklets

Discussions for Verification

Inappropriate fit of the model to data will cause to question about the diagnostic information

The proposed cognitive attributes and Q-matrix used in this study explained the performance of Taiwanese eight-graders on TIMSS-1999 mathematics tests very well

Consistent results with the current study were obtained from the previous two studies by using 20 and 3 countries from TIMSS-1999 study

Analysis

Diagnosing Knowledge States

To provide diagnostic information in terms of knowledge states and learning paths

The following analyses were conducted: Conducting rule-space analysis Grouping knowledge states Identifying learning paths

RSM Analyses

BUGSHELL, programmed by Tatsuoka, Varadi, and Tatsuoka (1992), was utilized

Using three-dimensional classification space: θ (the IRT ability ), ζ (unusualness), and generalized ζ

Setting relevant parameters Mahalanobis distance (D2) and the

difference of θ values were set to 4.5 and 1.5, respectively

The number of slips was not more than one-third of the total items

Boxplot for the Population

Diagnostic Information

Diagnostic Information

Clustering Knowledge States

Combining the attribute mastery probability vectors from the four rule-space analyses

A K-means cluster analysis was conducted

Deriving the centroids of the clustered knowledge states

Transforming attribute probability to attribute pattern by using cutoff point of .85

Clustered Knowledge States

The goal of clustering is to explore educationally interpretable groups of students’ attribute mastery probabilities and hierarchical relations among these groups

A twelve-cluster solution was selected as a final solution for the K-mean cluster analysis in this study

Cluster Solutions

Some solutions didn’t yield the clustered knowledge state representing students who mastered all 23 attributes

Some solutions did not yield the knowledge state to reflect students who mastered few attributes

As for some solutions, interpretable hierarchical relations among the clustered knowledge states could not be derived

Hierarchical Relations

A pair of clustered knowledge states has an hierarchical relation if each component in the one binary mastery vector is larger than or equal to the relative component in the other mastery vector

KS1 has an attribute mastery vector of (10111), KS2 is represented by (10011), and KS3 is represented by (10010)

The relationship among these knowledge states are denoted by KS3KS2KS1

These hierarchically-ordered knowledge states formed a chain, also called a learning path in the current study

Learning Paths

Identifying Learning Paths A hierarchically ordered network

was formed based on vectors of attribute mastery pattern

The hierarchically ordered network consisted of many sub-graphs, which are referred to as learning paths

Learning paths provide the practical information of how Taiwanese students progress in acquiring their cognitive attributes

Analysis

Comparing Group Differences

The dataset was separated into different groups

Each attribute mastery probability Attribute characteristic curves

(ACC) The percentage of students in each

clustered knowledge state Learning paths

Attribute Characteristic

CurveContent Attributes


CurveSkills/Item-type Attributes


CurveProcess Attributes

Gender Comparisons

Gender Comparisons

Gender Comparisons The finding indicates that gender

differences of Taiwanese students in terms of mathematical skills are quite minimal

That is, male and female middle school students in Taiwan show comparatively equivalent potential in learning mathematical skills

Variability of males’ mathematics performance in terms of knowledge state distributions is slightly greater than that of females

Male and female students were represented in the same proportion in learning path 1

Region Comparisons

Region Comparisons

Region Comparisons

Students in urban schools perform much better than those in rural schools on high-level mathematics contents and abstract thinking skills

Greater proportions of urban students were classified into knowledge states with larger numbers of mastered attributes, and greater proportions of rural students occupied knowledge states with fewer numbers of mastered attributes

More urban school students were represented in Learning Path 1

Conclusions RSM is a viable alternative to traditional

psychometric analysis of test scores Validating the cognitive model Providing diagnostic information with

descriptions of cognitive attributes Conducting group comparisons in terms of

cognitive attributes

Diagnostic information cannot be provided for students not classified into the predetermined knowledge states

Conclusions Taiwanese students perform well on all

cognitive attributes expect for Recognize pattern

Taiwanese students show some weaknesses on abstract thinking skills and algebra contents

Lowest and highest performing students also show largest mastery differences in thinking skills as well as algebra and its application

The learning gap between urban and rural schools in Taiwan exists not only in students’ total scores, but also in performance on critical cognitive attributes needed for mathematics learning

Future Research

Substantive Subjects Educational Technology Further Data Analysis Methodological Research

Substantive Subjects Skills diagnosis for other disciplines

In addition to mathematics, science, reading comprehension, food handling certificate tests, teacher certificate tests and … are also possible subjects

Developing cognitive processes of endorsing the psychological survey items Math test anxiety Student Self-Efficacy Beliefs in

Mathematics: Mastery experience, vicarious experience, social persuasions, and physiological/affective states

Educational Technology

Item Pool or Item Bank In addition to item properties from

IRT models, items can be banked by cognitive attributes

Developing enough test items for each attribute

Computerized Diagnostic Test (CDT) and Computerized Remedial Instruction (CRI)

Factor analyzing cognitive attributes Sub-skill scores as your dependent

variables Applying Multilevel Models (MLM)/

Hierarchical Linear Models (HLM), Growth Curve Models to cognitive attributes with educational context variables (such as, teaching strategies, teachers characteristics, and schools context variables)

Further Data Analysis

The purpose of selection Add the sub-skill score information

for the selection purpose How to decide the appropriate cut-off

point (type I error rate and power) Dimensionality study

The whole probability data the different learning path

Equating by cognitive attributes

Methodological Research

Questions and Comments

Documents

Rule-Space Methodology: Constructing More Useful Information from Test Performance Yi-hsin Chen Research Methods in a Nutshell College of Education Presentation