Upload
barbara-hudson
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Rule-Space Methodology:Constructing More Useful Information from Test Performance
Yi-hsin ChenResearch Methods in a Nutshell
College of Education PresentationCollege of Education Presentation
January 26January 26thth, 2007, 2007
Research Methods Workshops SeriesResearch Methods Workshops Series
Outline
Why What How When
Overview
What is educational assessment Educational assessment is a process of
collecting evidence and interpreting it to provide instructors with information regarding students’ learning (Glaser, 1962)
Student learning information Instructors can use this information to
identify what knowledge students have, diagnose the learning errors or misconceptions, and detect learning effects and outcomes
Overview
No Child Left Behind Act of 2001 (NCLB) The purpose of NCLB is to deal with the
improvement of academic achievement of disadvantaged students
Standardized educational tests The primary goal of standardized
educational tests is to obtain information of student learning with the ultimate goal of improving it
Overview
Majority of learning behaviors in classrooms center around problem solving or other mental functions
It would be useful to present or link test results in terms of cognitive process explicitly
How are test scores used so far Highly related with the
psychometric models
Traditional Paradigm
Traditional psychometric model: Classic True Score Theory (CTST) and Item Response Theory (IRT)
Single-score-based testing paradigm Test scores do not reflect the
cognitive information innate in test scores
The lack of cognitive information incorporated into traditional psychometric models
Limitations of Test Score
Utility of Test Scores Without cognitive information, the utility of
the tests is limited as a means of diagnostic feedback to teachers in classrooms
As a result, achievement tests have mainly been used for the purposes of selection, placement, and certification
Construct Validity of Tests Without cognitive information, evidence
typically consists of correlations between test scores and other measures
Little information is available that is more directly concerned with the theoretical mechanisms underlying successful test performance
Traditional Paradigm
Conventional skills-level assessment (diagnostic assessment) A list of cognitive Domains (targeted
skills) A subset of items is associated with
each domains (skills) (One item belongs to only one category)
Subscore on each of cognitive domains Conventionally, student’s cognitive
skills-mastery profile is based on subscoing
New Approaches To date, some psychometricians have applied
cognitive psychology principals to psychometric models of educational assessment data for these purposes, called psychometric skills-diagnostic models
Stout (2002) mentioned several milestones in the psychometric history of cognitive modeling Gerhardt Fischer: Linear logistic trait model (LLTM) Susan Embretson: A series of multidimensional
logistic IRT models Edward Haertel: Restricted latent class model Kikumi Tatsuoka: Rule-space methodology Robert Mislevy: Bayes net approach to skills
diagnosis
RSM
Tatsuoka’s rule-space methodology (RSM) is one of these new approaches which can be used to measure students’ knowledge states consisting of master/non-mastered cognitive skills, knowledge, and strategies
Mathematically, RSM is a mathematically probabilistic approach
Methodologically, RSM is a cognitively diagnostic method, which is an approach of pattern classification and statistical decision
RSM
A Student’s Observed Item Response Patterns 1 0 1 1 1 0 1 1 0 1 (10 Items)
A Student’s Unobserved Attribute Mastery Probabilities or Pattern .83 .95 1.00 .75 .34 (5 Attributes) 1 1 1 0 0 (cutoff point of .80)
RSM
Pattern Classification
& Statistical Decision
Pattern Classification
& Statistical Decision
Procedures of RSM
Step1: Identifying a list of cognitive attributes and Q-matrix
Step 2: Determining ideal item-response patterns corresponding to knowledge states
Step 3: Mapping the students’ response patterns and the ideal item-response patterns onto classification space
Step 4: Classifying an student’s responses into one of the closest knowledge states
Two- or Multi-Dimensional
Classification Spaces
Ideal Item Response Patterns
(BDF)
Ideal Item Response Patterns
(BDF)
Students’ Item Response Patterns
Attribute Probabilities
(Mahalanobis Distance (D2) & Bayesian Minimum Error Rule)
A Set of Attributes& Q-Matrix
A Set of Attributes& Q-Matrix IdentificationIdentification
DeterminationDeterminationMappingMapping
ClassificationClassification
Step 1: Identification A list of cognitive attributes and incidence
matrix (Q-matrix) are identified in Step 1 Cognitive attributes for the test may
include knowledge, processes, strategies, and skills, which are required to answer the items correctly
Incidence matrix depicts the relationships between items and attributes
Both are referred to as a cognitive model of the test in the rule-space analyses
Step 1: Identification A list of cognitive attributes and incidence
matrix (Q-matrix) are identified in Step 1 Cognitive attributes for the test may
include knowledge, processes, strategies, and skills, which are required to answer the items correctly
Incidence matrix depicts the relationships between items and attributes
Both are referred to as a cognitive model of the test in the rule-space analyses
Attribute List CONTENT ATTRIBUTES
SKILL/ITEM TYPE ATTRIBUTES
PROCESS ATTRIBUTES
Content Attributes
C1: Basic concepts, properties and operations in whole numbers and integers
C2: Basic concepts, properties, and operations in fractions and decimals
C3: Basic concepts, properties, and operations in elementary algebra
C4: Basic concepts and properties of two-dimensional geometry
C5: Data, probability, and basic statistics
Skill/Item Type Attributes
S2: Applying number properties and relationships; number sense/number line
S3: Using figures, tables, charts and graphs
S4: Approximation/Estimation S5: Evaluate/Verify/Check Options S6: Patterns and relationships (be able to
apply inductive thinking skills) S7: Using proportional reasoning S8: Solving novel or unfamiliar problems S10: Open-ended item, in which an
answer is not given S11: Using words to communicate
questions (word problem)
Process Attributes P1: Translate/formulate equations and
expressions to solve a problem P2: Computational applications of knowledge
in arithmetic and geometry P3: Judgmental applications of knowledge in
arithmetic and geometry P4:Applying rules in algebra P5: Logical reasoning—includes case
reasoning, deductive thinking skills, if-then, necessary and sufficient, generalization skills
P6: Problem Search; Analytic Thinking, Problem Restructuring and Inductive Thinking
P7: Generating, visualizing and reading Figures and Graphs
P9: Management of Data and Procedures P10: Quantitative and Logical Reading
A Coding Example
Question: Mary ran a race in 49.86 seconds. Betty ran the same race in 52.30 seconds. How much longer did it take Betty to run the race than Alice?A. 2.44 seconds B. 2.54 seconds C. 3.56 seconds D. 3.76 seconds
Attributes involvement Content is fraction and decimals----------------------- C2 Dealing with time is very common and routine------
S8 Using words to express a question--------------------
S11 Subtracting 49.86 from 52.30 is straight forward
translation of the expression to arithmetic------------ P1
Q-Matrix
The incidence matrix (Q-Matrix) is a I x A binary indicator matrix for which the rows (I) represent items and the columns (A) represent attributes
Step 2: Determination
A list of Attributes
Possible Knowledge States
Q-Matrix Boolean Descriptive Function
Ideal Item Response Pattern
Step 2: Determination
The goal of the determination step is to determine ideal item-response patterns based on the possible knowledge states and the identified Q-matrix
The knowledge state is defined as the attribute mastery pattern where 1 stands for mastered and 0 for not mastered
Given a three attributes example, there are 8 (23) possible knowledge states, including (000) (100) (010) (001) (110) (101) (011) (111)
Boolean Descriptive Function
A Boolean Descriptive Function (BDF; Tatsuoka, 1991) is applied to connect latent knowledge states with ideal item-response patterns
The basic assumption behind a Boolean Descriptive Function is that an item can be answered correctly if and only if all the attributes involved in this item have been mastered
BDF: An item can be answered correctly if and only if all the attributes involved in the item have been mastered
Ideal Item Response Pattern
Step 2: Determination
Since the knowledge state is unobservable, the observable ideal item-response pattern should be determined
An ideal item-response pattern is the pattern of correct and incorrect responses that an individual demonstrates that are consistent with the attributes an individual has or has not mastered
Ideal item-response patterns can be considered as classification groups in RSM
Step 3: Mapping The third step is mapping examinee item-
response patterns and ideal item-response patterns onto the classification space (θ, ζ)
The rule-space methodology utilizes the Cartesian Coordinate System to formulate an orthogonal two-dimensional classification space
The classification space consists of the latent ability variable in IRT, θ, along the horizontal axis and one of the IRT-based caution indices, ζ, which is the unusualness of item response patterns, along the vertical axis
Caution Indices ζ
Step 4: Classification
In the classification stage, the comparison of the examinee’s item-response pattern to each of all possible ideal item-response patterns in the classification space (θ, ζ) is performed
Mahalanobis distance (Dis2) between the point
associated with the examinee’s item-response pattern in the classification space (θ, ζ) and the point associated with each of the ideal item-response patterns, is applied as an admissibility criterion for this comparison
Limitations of D2
Its use can lead to more than one acceptable ideal item-response pattern for a particular examinee’s item-response pattern
Further, the Mahalanobis distance does not yield the probabilities of misspecifications (or errors) or any other evidence for determining the attribute mastery profile
Bayesian Decision Rule
A Bayesian minimum error rule is applied to yield the posterior probability for the final decision on classification
To classify the examinee’s item-response pattern into only one closest ideal item-response pattern with the highest posterior probability
Purpose of this Study To verify whether previously identified
cognitive attributes represent the performance of Taiwanese eighth graders on the TIMSS-1999 mathematics tests
To examine the knowledge states most populated by the Taiwanese students
To compare group differences in terms of cognitive attributes Performance level Gender Region
Analysis
Verifying the Cognitive Model
To validate both the attributes and Q-matrix
The following things were conducted Computing classification rate Multiple regression analyses Comparing mastery probability of
each attribute across four booklets
Classification Rate The proportion of examinees who
are classified successfully into at least one of the predetermined knowledge groups
If the classification rate is low, this suggests that the ideal item response pattern derived from the cognitive model do not reflect the actual examinee performance on the TIMSS-1999 mathematics test
Multiple Regression Analyses
To regress examinee ability parameter (such as total scores and the first plausible value) on examinee attribute mastery probability
R-square and adjusted R-square indices were checked
To determine how well attribute probabilities account for examinees’ performance (total scores)
Consistency
The means and standard deviations of attribute mastery probabilities were computed for four booklets
The consistency of attribute mastery probabilities across four booklets was checked
Inconsistent attributes reflect a problem concerning attributes and/or attribute coding
Results of Verification
The mean squared Maholonobis distance (D2) from the closest latent knowledge states was .44
Classification rates were extremely high (99.3% to 99.9%), and only 11 out of 2874 students were not assigned to at least one of the predetermined knowledge states
Regression analyses with total scores obtained extremely high R2 and adjusted R2(.943 to .979) for four booklets as well as .925 and .924 for the entire sample
Results of Verification
The ranges of mean attribute probabilities across booklets for 20 attributes were less than .20 and 13 out of 23 attributes had probability difference ranges less than .10
The largest difference in range of mean attribute probabilities across booklets was .27 for Recognize pattern (S6)
Recognize patterns (S6) was required in the fewest total items (14 items) across the four booklets
Discussions for Verification
Inappropriate fit of the model to data will cause to question about the diagnostic information
The proposed cognitive attributes and Q-matrix used in this study explained the performance of Taiwanese eight-graders on TIMSS-1999 mathematics tests very well
Consistent results with the current study were obtained from the previous two studies by using 20 and 3 countries from TIMSS-1999 study
Analysis
Diagnosing Knowledge States
To provide diagnostic information in terms of knowledge states and learning paths
The following analyses were conducted: Conducting rule-space analysis Grouping knowledge states Identifying learning paths
RSM Analyses
BUGSHELL, programmed by Tatsuoka, Varadi, and Tatsuoka (1992), was utilized
Using three-dimensional classification space: θ (the IRT ability ), ζ (unusualness), and generalized ζ
Setting relevant parameters Mahalanobis distance (D2) and the
difference of θ values were set to 4.5 and 1.5, respectively
The number of slips was not more than one-third of the total items
Boxplot for the Population
Diagnostic Information
Diagnostic Information
Clustering Knowledge States
Combining the attribute mastery probability vectors from the four rule-space analyses
A K-means cluster analysis was conducted
Deriving the centroids of the clustered knowledge states
Transforming attribute probability to attribute pattern by using cutoff point of .85
Clustered Knowledge States
The goal of clustering is to explore educationally interpretable groups of students’ attribute mastery probabilities and hierarchical relations among these groups
A twelve-cluster solution was selected as a final solution for the K-mean cluster analysis in this study
Cluster Solutions
Some solutions didn’t yield the clustered knowledge state representing students who mastered all 23 attributes
Some solutions did not yield the knowledge state to reflect students who mastered few attributes
As for some solutions, interpretable hierarchical relations among the clustered knowledge states could not be derived
Hierarchical Relations
A pair of clustered knowledge states has an hierarchical relation if each component in the one binary mastery vector is larger than or equal to the relative component in the other mastery vector
KS1 has an attribute mastery vector of (10111), KS2 is represented by (10011), and KS3 is represented by (10010)
The relationship among these knowledge states are denoted by KS3KS2KS1
These hierarchically-ordered knowledge states formed a chain, also called a learning path in the current study
Learning Paths
Identifying Learning Paths A hierarchically ordered network
was formed based on vectors of attribute mastery pattern
The hierarchically ordered network consisted of many sub-graphs, which are referred to as learning paths
Learning paths provide the practical information of how Taiwanese students progress in acquiring their cognitive attributes
Analysis
Comparing Group Differences
The dataset was separated into different groups
Each attribute mastery probability Attribute characteristic curves
(ACC) The percentage of students in each
clustered knowledge state Learning paths
Attribute Characteristic
CurveContent Attributes
Attribute Characteristic
CurveSkills/Item-type Attributes
Attribute Characteristic
CurveProcess Attributes
Gender Comparisons
Gender Comparisons
Gender Comparisons The finding indicates that gender
differences of Taiwanese students in terms of mathematical skills are quite minimal
That is, male and female middle school students in Taiwan show comparatively equivalent potential in learning mathematical skills
Variability of males’ mathematics performance in terms of knowledge state distributions is slightly greater than that of females
Male and female students were represented in the same proportion in learning path 1
Region Comparisons
Region Comparisons
Region Comparisons
Students in urban schools perform much better than those in rural schools on high-level mathematics contents and abstract thinking skills
Greater proportions of urban students were classified into knowledge states with larger numbers of mastered attributes, and greater proportions of rural students occupied knowledge states with fewer numbers of mastered attributes
More urban school students were represented in Learning Path 1
Conclusions RSM is a viable alternative to traditional
psychometric analysis of test scores Validating the cognitive model Providing diagnostic information with
descriptions of cognitive attributes Conducting group comparisons in terms of
cognitive attributes
Diagnostic information cannot be provided for students not classified into the predetermined knowledge states
Conclusions Taiwanese students perform well on all
cognitive attributes expect for Recognize pattern
Taiwanese students show some weaknesses on abstract thinking skills and algebra contents
Lowest and highest performing students also show largest mastery differences in thinking skills as well as algebra and its application
The learning gap between urban and rural schools in Taiwan exists not only in students’ total scores, but also in performance on critical cognitive attributes needed for mathematics learning
Future Research
Substantive Subjects Educational Technology Further Data Analysis Methodological Research
Substantive Subjects Skills diagnosis for other disciplines
In addition to mathematics, science, reading comprehension, food handling certificate tests, teacher certificate tests and … are also possible subjects
Developing cognitive processes of endorsing the psychological survey items Math test anxiety Student Self-Efficacy Beliefs in
Mathematics: Mastery experience, vicarious experience, social persuasions, and physiological/affective states
Educational Technology
Item Pool or Item Bank In addition to item properties from
IRT models, items can be banked by cognitive attributes
Developing enough test items for each attribute
Computerized Diagnostic Test (CDT) and Computerized Remedial Instruction (CRI)
Factor analyzing cognitive attributes Sub-skill scores as your dependent
variables Applying Multilevel Models (MLM)/
Hierarchical Linear Models (HLM), Growth Curve Models to cognitive attributes with educational context variables (such as, teaching strategies, teachers characteristics, and schools context variables)
Further Data Analysis
The purpose of selection Add the sub-skill score information
for the selection purpose How to decide the appropriate cut-off
point (type I error rate and power) Dimensionality study
The whole probability data the different learning path
Equating by cognitive attributes
Methodological Research
Questions and Comments