8
A TheowBased Framework for Assessing Domainlspecific Brenda Sugrue University of Northern Colorado ~ ~ Is problem solving a global construct? What compo- nents can be assessed? What multiple assessment strategies can be used? easurement experts increas- M ingly agree that assessment reform needs to move from a task- based focus to a construct- or theory- based focus (Baker, O’Neil, & Linn, 1993; Frederiksen, Mislevy, & Bejar, 1993; Messick, 1994). Recent efforts to develop performance assessments have generated many intuitively ap- pealing tasks, with little or no ex- plicit linking of the tasks to the cognitive constructs being promoted and measured. Abilities such as problem solving tend to be treated as global constructs. The emphasis is on surface features of the tasks, such as their authenticity and format, rather than on the precise definition of what is being measured, or the precise diagnosis of the sources of poor performance. Consistent emer- gence of considerable within-subject variability in performance across tasks in which cognitive demands were deemed comparable at a global level (Dunbar, Koretz, & Hoover, 1991; Linn, Baker, & Dunbar, 1991; Shavelson,Baxter, & Gao, 1993)may indicate a need for more precision in defining what these tasks measure. When problem-solving ability is assessed as a global construct in the context of authentic extended tasks, it is implicitly assumed that a partic- ular task will be equally unfamiliar for all students, since a task must be novel in order to be a “problem” (Bodner, 1991). If the novelty of a task varies from student to student, and if the level of novelty varies from task to task for any particular stu- dent, then it is not surprising that estimates of global abilities are un- stable. This article advocates a more frag- mented approach to the assessment of global ability constructs than is currently in vogue. The approach is based on the assumption that de- composing a complex ability, such as problem solving, into its cognitive components and tracking patterns of performance across multiple mea- sures of those components will yield more valid and instructionally useful information than attempts to target the composite ability from the out- set. This article is divided into two sections. First, the theoretical basis for the cognitive constructs included in the framework is described, fol- lowed by specifications for designing multiple measures of the constructs. Theoretical Basis for the Selection of Cognitive Components of Problem- Solving Performance Although it is difficult to piece to- gether a definitive list of the cogni- tive variables associated with problem solving, comprehensive re- search-based models have been sug- gested by Glaser (1992); Schoenfeld (1985); and Smith (1991). One com- ponent, domain-specific knowledge, is common to all three models, and two models share the components of self-regulatorylcontrol skills, heuris- ticdgeneral problem-solving strate- gies, and beliefs/affect. Each of these common components will be consid- ered in order to identig a set of spe- cific constructs to be included in the assessment design framework pre- sented here. Domain-Specific Knowledge Breadth and depth of one’s knowl- edge about a domain influence one’s ability to solve problems in that do- main (Glaser, 1992). Domain-specific knowledge of good problem solvers is often described as connected, inte- grated, coherent, and chunked. In contrast, knowledge of poor problem solvers is deemed fragmented and unconnected. Regardless of the do- main, the knowledge of good prob- lem solvers seems to be organized around key principles (general rules) that guide actions and decisions in a variety of task situations (Chi, Glaser, & Rees, 1982; Glaser, 1992; Larkin, 1983). In addition to knowl- edge of general principles, good prob- lem solvers also draw on a store of automated task-specific procedures. If diagnosis of the source(s)of poor problem-solving performance is one goal of assessment, then the assess- ment should permit identification of the nature and extent of a student’s knowledge of principles and proce- dures in the domain of interest. In addition, since principles are rules that involve relationships among concepts, then the student’s knowl- Brenda Sugrue is an Assistant Profes- sor in the Educational Technology Pro- gram, College of Education, University of Northern Colorado, McKee 213, Gree- ley, CO 80639. Her specializations are cognition, instructional design, and as- sessment. Fall 1995 29

A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

Embed Size (px)

Citation preview

Page 1: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

A TheowBased Framework for Assessing Domainlspecific

Brenda Sugrue University of Northern Colorado

~ ~

Is problem solving a global construct? What compo- nents can be assessed? What multiple assessment strategies can be used?

easurement experts increas- M ingly agree that assessment reform needs to move from a task- based focus to a construct- or theory- based focus (Baker, O’Neil, & Linn, 1993; Frederiksen, Mislevy, & Bejar, 1993; Messick, 1994). Recent efforts to develop performance assessments have generated many intuitively ap- pealing tasks, with little or no ex- plicit linking of the tasks to the cognitive constructs being promoted and measured. Abilities such as problem solving tend to be treated as global constructs. The emphasis is on surface features of the tasks, such as their authenticity and format, rather than on the precise definition of what is being measured, or the precise diagnosis of the sources of poor performance. Consistent emer- gence of considerable within-subject variability in performance across tasks in which cognitive demands were deemed comparable at a global level (Dunbar, Koretz, & Hoover, 1991; Linn, Baker, & Dunbar, 1991; Shavelson, Baxter, & Gao, 1993) may indicate a need for more precision in defining what these tasks measure.

When problem-solving ability is assessed as a global construct in the context of authentic extended tasks, it is implicitly assumed that a partic- ular task will be equally unfamiliar for all students, since a task must be novel in order to be a “problem” (Bodner, 1991). If the novelty of a task varies from student to student, and if the level of novelty varies from

task to task for any particular stu- dent, then it is not surprising that estimates of global abilities are un- stable.

This article advocates a more frag- mented approach to the assessment of global ability constructs than is currently in vogue. The approach is based on the assumption that de- composing a complex ability, such as problem solving, into its cognitive components and tracking patterns of performance across multiple mea- sures of those components will yield more valid and instructionally useful information than attempts to target the composite ability from the out- set. This article is divided into two sections. First, the theoretical basis for the cognitive constructs included in the framework is described, fol- lowed by specifications for designing multiple measures of the constructs.

Theoretical Basis for the Selection of Cognitive Components of Problem- Solving Performance Although it is difficult to piece to- gether a definitive list of the cogni- tive variables associated with problem solving, comprehensive re- search-based models have been sug- gested by Glaser (1992); Schoenfeld (1985); and Smith (1991). One com- ponent, domain-specific knowledge, is common to all three models, and two models share the components of self-regulatorylcontrol skills, heuris-

ticdgeneral problem-solving strate- gies, and beliefs/affect. Each of these common components will be consid- ered in order to identig a set of spe- cific constructs to be included in the assessment design framework pre- sented here.

Domain-Specific Knowledge Breadth and depth of one’s knowl- edge about a domain influence one’s ability to solve problems in that do- main (Glaser, 1992). Domain-specific knowledge of good problem solvers is often described as connected, inte- grated, coherent, and chunked. In contrast, knowledge of poor problem solvers is deemed fragmented and unconnected. Regardless of the do- main, the knowledge of good prob- lem solvers seems to be organized around key principles (general rules) that guide actions and decisions in a variety of task situations (Chi, Glaser, & Rees, 1982; Glaser, 1992; Larkin, 1983). In addition to knowl- edge of general principles, good prob- lem solvers also draw on a store of automated task-specific procedures.

If diagnosis of the source(s) of poor problem-solving performance is one goal of assessment, then the assess- ment should permit identification of the nature and extent of a student’s knowledge of principles and proce- dures in the domain of interest. In addition, since principles are rules that involve relationships among concepts, then the student’s knowl-

Brenda Sugrue is an Assistant Profes- sor in the Educational Technology Pro- gram, College of Education, University of Northern Colorado, McKee 213, Gree- ley, CO 80639. Her specializations are cognition, instructional design, and as- sessment.

Fall 1995 29

Page 2: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

edge of the individual concepts should also be measured. It may be that a student has knowledge of in- dividual concepts but has little or no knowledge of the general rules (prin- ciples) governing the relationships among the concepts. Finally, one should be able to identify students who have knowledge of principles, but whose knowledge of specific pro- cedures is limited. For example, a student might be able to select ap- propriate statistical analyses given various types of data and research questions (exhibiting knowledge of principles) but might not be able to perform the actual analysis proce- dures.

If assessment can uncover more precise deficits in students’ knowl- edge bases, then more specific pre- scriptions for instructional reme- diation can be made for individual students and for groups of students manifesting similar weaknesses. For example, the prescription for a stu- dent who has knowledge of princi- ples, but who lacks speed and accuracy in executing procedures during solution, would be opportuni- ties to practice specific procedures. The prescription for a student who can apply individual concepts, but cannot apply more complex rules in- volving multiple concepts, would be a carefully sequenced series of prac- tice activities where the situations called for an ever-increasing number of concepts to be considered and ap- plied at once.

To diagnose and remediate do- main-specific knowledge gaps, three separate elements of content knowl- edge are included in the framework proposed here: principles, concepts, and procedures.

Metacognition and General Problem-Solving Strategies Both the self-regulation and general problem-solving strategy compo- nents of problem-solving perfor- mance are considered together because there is considerable overlap in the behaviors and processes in- cluded in each category. For example, planning and monitoring appear in all lists of components of self-regula- tion and metacognition (e.g., Campi- one & Brown, 1990) and are also elements of most general problem- solving strategies (e.g., Polya, 1945).

It is not clear if executive control processes and strategies of success- ful problem solvers are an additional source of variance in problem-solv- ing performance or are simply a by-product of acquiring a well-struc- tured store of domain-specific knowl- edge (Alexander & Judy, 1988). If metacognitive ability is indeed a function of the extent and structure of domain-specific knowledge, then it would not be necessary to measure separate metacognitive components to identify the cognitive sources of poor problem solving or to predict which students are likely to be good problem solvers; identification of gaps in domain-specific knowledge would be sufficient. If, on the other hand, metacognition is an additional source of variance in problem-solv- ing performance, then it would be important to measure metacognitive components and to make instruc- tional recommendations that go be- yond specific content knowledge. Given the lack of research on rela- tive contributions of domain-specific knowledge and metacognition to problem solving, the metacognitive components of planning and moni- toring are included in the list of con- structs to be measured.

Motivation There is increasing acknowledgment of the role of affective variables in problem-solving performance (Silver, 1985). Snow (1993) cites research that would support, and indicate the need to further investigate, the hy- pothesis that, in many cases, vari- ation in performance across assess- ment tasks might be attributable not to variation in their cognitive de- mands but rather to variation in the motivational orientations they evoke. Effort and persistence during task performance have been linked to perceptions of self-particularly perceived self-efficacy (Bandura, 1986) and perceptions of the task, particularly perceived task difficulty (Salomon, 1984) and perceived task attraction (Boekaerts, 1987; Pintrich & De Groot, 1990). These variables seem to be situation-specific and to operate independently of the extent of one’s task-relevant knowledge base.

To distinguish between students who lack domain-specific knowledge and those who lack appropriate moti-

vational orientation, separate mea- surement of perceptions of self and tasks is needed. If a student reports low perceived self-efficacy in relation to particular tasks, or low perceived attraction to certain types of tasks, or consistently underestimates the de- mands of some kinds of tasks, while at the same time performing poorly on those tasks, then interventions targeting those motivational sources of poor performance can be advo- cated prior to, or in addition to, instruction directed at content knowledge andor metacognitive strategies. Concurrent measurement of motivational constructs will per- mit estimation of the extent to which both intersubject and intrasubject variations in performance, across tasks designed to tap similar under- lying cognitive processes and struc- tures, are related to differences in the motivational orientations induced by the tasks. The complete set of compo- nents selected for inclusion in the as- sessment-design framework pro- posed here is presented in Figure 1.

Multiple Measures of the Critical Cognitive Constructs Underlying Problem-Solving Performance Recent assessment reforms have been characterized by movement away from multiple-choice response formats to test items that require students to construct a response. However, it is still not clear whether different assessment formats per se tap different aspects of a student’s knowledge base (Bennett, 1993). If the comparability of the cognitive demands of multiple formats can be demonstrated for measuring particu- lar constructs, then decisions regard- ing format could be based on efficiency and authenticitylface va- lidity requirements rather than on unsubstantiated impressions of the cognitive constructs being tapped. If, on the other hand, it can be shown that, for the same student, different formats lead to different estimates of a particular construct, then the di- mensionality of the construct itself is called into question.

Messick (1993) suggests that as- sessment design taxonomies should permit separation of method or for- mat variance from variance relevant to the focal constructs being mea-

30 Educational Measurement: Issues and Practice

Page 3: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

principles

Domain-specific Knowledge concepts Constructs

procedures

planning

monitoring Metacognitive Constructs

perceived self-efficacy

Motivational Constructs perceived task difficulty

perceived task attraction

FIGURE 1 . Cognifive components of problem solving to be assessed

sured. Assessment strategies recom- mended here for measuring different aspects of domain-specific knowl- edge, metacognition, and motivation will therefore be organized around three categories of student response: selection, generation, and explana- tion. Multiple formats for measuring each of the constructs are included in the framework. Thus, the frame- work has the potential to facilitate research on the relative validity of different formats for tapping the constructs identified as cognitive components of problem-solving abil- ity The complete construct-by-for- mat matrix is presented in Figure 2.

Multiple Measures of Domain- Specific Knowledge Constructs When used to measure domain-spe- cific knowledge, the first two formats in the construct-by-format matrix presented in Figure 2 are similar to the response modes proposed by Williams and Haladyna (1982) in their three-dimensional typology for generating higher level test items. The selection format refers to assess- ment tasks where a student has to select a response from a set of alter- natives. In assessment of knowledge of a concept, students are presented with multiple examples of the con- cept and asked to select those that belong to the category; multiple, re- lated concepts could be assessed si- multaneously by having students match concept labels to a set of

mixed examples. If knowledge of a principle were the target of assess- ment, students would be presented with a description or simulation of an event and a list of multiple pre- dictions or solutions from which to select the most appropriate predic- tion or solution. If knowledge of a procedure were the focus, students could be presented with descriptions or simulations of a number of differ- ent procedures for accomplishing a particular goal and asked to select the most appropriate procedure for the task, or a student could be asked to select the correct order in which the steps of a procedure should be performed or to select the errors made in someone else’s performance of the procedure.

The generation format refers to assessment tasks that ask a student to construct or suggest a response for example, asking a student to draw or create an instance of a con- cept; to make a prediction or gener- ate a solution for a particular situation; or to perform a procedure, given a particular goal and set of conditions. The explanation format refers to assessment items where the student is asked to give a reason for a response he or she just selected or generated. Alternatively, students could be asked to explain why a de- scribed or simulated object or event belongs to a particular category of objects or events (test of concept knowledge) or why a particular

event or outcome occurred, or is likely to occur, in a system or situa- tion (test of principle knowledge). For a procedure, students could be asked to explain how to perform the procedure (as opposed to actually performing it).

In terms of Anderson’s (1993) dis- tinction between declarative and proceduralized knowledge, selection and generation formats should be more sensitive to variation in proce- duralized (usable) knowledge of principles, concepts, and procedures, whereas the explanation format should be more sensitive to declara- tive (verbalizable) knowledge of the same principles, concepts, and proce- dures. Thus, the first two formats in the matrix (selection and genera- tion) are liable to be interchange- able-that is, are likely to yield similar profiles of the nature and ex- tent of a student’s domain-specific knowledge. The third format in the matrix (explanation) may tell a dif- ferent story about the state of a stu- dent’s knowledge with respect to particular constructs and is probably a less reliable predictor of problem- solving ability than the first two.

Test items or tasks can be de- signed to fit individual cells of the top three rows of the matrix or to represent combinations of cells. However, the designer of any com- plex or extended task should identify, and communicate to those who will score and make decisions based on performance on the task, which pieces of the task represent particu- lar cells of the matrix.

In order to create assessments that fit cells in the top row of the construct-by-format matrix pre- sented in Figure 2, the key principles in the content domain or subdomain of interest must first be identified. Unless the principles (general rules) that govern the relationships among concepts in a domain are articulated and targeted by assessment, esti- mates of students’ ability in the do- main may be restricted to knowledge of discrete concepts and procedures or to performance on authentic tasks where knowledge requirements are not specified in detail. Once the gen- eral principles in a domain have been identified, then the concepts that are related by the principles and the sequences of actions and deci- sions that apply the principles in the

Fall 1995 31

Page 4: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

Format I I Principles

Explanation I Generation I Selection I Construct

select predictions generate predictions explain predictions or solutions or solutions or solutions

Concepts

Procedures

explain why

concept attributes select examples generate examples examples reflect

select task-specific perform task-specific explain how to procedures procedures perform a procedure

Planning

Perceived task attraction

Monitoring

that represent invdilative of effort

attraction of during assessment perceived relative and persistence

different assessment

Perceived self-efficacy

Perceived task difficulty

select or rate items that represent

amount and type of planning engaged in during assessment select or rate items

that represent amount and type of monitoring engaged

select or rate items that represent level

of confidence in ability to do well on

different assessment

that represent perceived relative

difficulty of different

engage in behaviors indicative of

planning during assessment

engage in behaviors indicative of

monitoring during assessment

engage in behaviors indicative of effort

and persistence during assessment

verbally describe amount and type of

planning engaged in during assessment

vbrbally describe amount and type of monitoring engaged

in during assessment

verbally describe one's perception of one's ability to do well on different

, assessment tasks

during assessment I assessment tasks I I assessment tasks I select or rate items I engage in behaviors I verballv describe

one's perception of the relative attraction of

different assessment I tasks I I tasks I

FIGURE 2 . Construct-by-format matrix for measuring constructs related to problem-solving performance

context of particular task goals and conditions can also be identified, per- mitting the design of a full range of assessment activities.

Some domains, such as science, mathematics, economics, and geog- raphy, lend themselves well to ex- traction of principles, rules, or laws. For example, in the domain of math- ematics, general rules, such as those governing the relationship between the concepts of area and volume, or formulas for calculating simple and compound interests can be deter- mined. In the domain of geography, principles such as those governing the relationships among aspects of the physical environment and settle- ment patterns, or those that charac- terize the relationships between rainfall patterns and crop yields, are also easily identified. In science, there are many principles that spec-

ify how one concept is a function of other concepts; for example, Ohm's law stipulates that, in an electrical circuit, current is a function of both voltage and resistance.

Taking Ohm's law as an example of a principle in a more obviously rule-governed domain, one could cre- ate a set of selection items that ask students to select, from a list of alter- native predictions (e.g., increase, de- crease, or no change), what would be likely to happen to the voltage, resis- tance, and current in a circuit if var- ious changes were made to com- ponents of the circuit. Figure 3 illus- trates how such a set of items would look. An equivalent set of generation items would ask students to describe (generate), rather than select from a menu, what would happen to the current in the circuit if certain changes were made to its compo-

nents. Explanation items targeting the same knowledge would ask stu- dents to explain why the current in a circuit would increase if particular changes were made (e.g., if a compo- nent were removed) or why the cur- rent would decrease when other changes were made (e.g., if longer wires were used). Separate items of each format type (selection, genera- tion, and explanation) could also be created to test knowledge of the indi- vidual concepts of voltage, resistance, and current, Finally, selection and generation items could be developed to test students' knowledge of proce- dures for connecting components in circuits; for example, students could be asked to select the best way to wire a circuit in order to maximize the current or to actually connect up a circuit so that a certain level of cur- rent would flow through it. Thus, all cells of the top three rows of the con- struct-by-format matrix would be represented in a set of assessment ac- tivities targeting knowledge of the domain of electric circuits.

In some domains, it is not so easy to extract unambiguous general rules governing relationships among concepts. For example, the domain of history contains a large number of well-defined concepts, such as immi- gration, colonization, civil war, na- tionalism, or democracy; however, the rules governing the relationships among such concepts are often fuzzy Unless even probabilistic general rules, or sets of alternative principles that could be used to explain or pre- dict the relationships among sets of concepts from different historical perspectives, are identified, then test items cannot be developed to fit the cells in the top row of the matrix in Figure 2.

An example of a principle in the domain of history might be that the underdevelopment of Third World countries is a function of coloniza- tion and specialization of production (McGowan & Woyach, 1989). Knowl- edge of this principle could be as- sessed via a set of selection items, in which a student is asked to select which of a set of described countries were more likely to become Third World; or via a set of generation items, in which a student is asked to generate a plan to prevent a de- scribed country from becoming Third World; or via a set of explana-

32 Educational Measurement: Issues and Practice

Page 5: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

Predict what will happen to the voltage, resistance, and current in the following circuit if each of the changes listed in the chart is made. Circle INCREASE, DECREASE, or NO CHANGE, in each box in the chart. Assume that the circuit is properly reconnected after a change is made.

What will happen Voltage Resistance if you

add another bulb? INCREASE INCREASE DECREASE DECREASE

NO CHANGE NO CHANGE

Current

INCREASE DECREASE

NO CHANGE

INCREASE DECREASE

NO CHANGE

add a 9-volt battery? INCREASE INCREASE DECREASE DECREASE

NO CHANGE NO CHANGE

tion items, where a student is asked to explain why some described coun- tries became Third World and some did not. The individual concepts of underdevelopment, Third World, col- onization, and specialization of pro- duction could be targeted by assessments that fit each of the cells in the second row of the construct- by-format matrix. Procedures associ- ated with the general rule might include sequences of actions and de- cisions appropriate for locating and summarizing data relating to the production profile of a particular country. Knowledge of such proce- dures could be assessed by asking students to either select the appro- priate sequence of steps or perform (generate) the steps to gather data for a particular case.

Multiple Measures of Metacognitive Constructs There are two distinct approaches to measuring metacognitive variables: students’ self-reports (either concur- rent or postperformance) and behav-

INCREASE DECREASE

NO CHANGE

remove one bulb?

ioral or performance-based indi- cators (Meichenbaum, Burland, Gru- son, & Cameron, 1985; Snow & Jack- son, 1993). To date, most efforts to measure metacognitive variables have concentrated on self-report methods in the context of learning rather than assessment situations. For example, Pintrich and DeGroot (1990) asked students to respond on a 7-point Likert scale (where 1 meant that the item was not at all true for the student and 7 meant that the item was very true for the student) to items such as “When I’m reading, I stop once in a while and go over what I have read” or “Before I begin studying, I think about the things I will need to do to learn.” In the context of an assessment event, a student could be asked after comple- tion of the test to indicate on a simi- lar scale the extent to which items such as “I worked out how much time I should spend on each ques- tion, and I tried to stick to it;” “I spent a long time planning how I would answer the questions;” or “I

INCREASE INCREASE DECREASE DECREASE

NO CHANGE NO CHANGE

went over my answers to make sure I had not made a mistake” applied to him or her (O’Neil, Sugrue, Abedi, Baker, & Golan, 1992). Interviewing students, or having them think aloud as they perform a task (e.g., Campione & Brown, 19901, and post- performance interviews (e.g., Peter- son, Swing, Braverman, & Buss, 1982) are other methods that have been used in research studies to probe the extent to which students engage in metacognitive processing.

In the construct-by-format matrix presented here (see Figure 2), self- report measures that require stu- dents to select or rate items that describe how much and what kind of planning or monitoring they en- gaged in during performance on as- sessment activities are classified as selection measures of metacognitive constructs. Self-report measures that require students to verbally de- scribe (either orally or in writing) the nature and extent of the plan- ning and monitoring they engaged in during the assessment event are classified as explanation measures of metacognitive constructs. Actual performance-based indicators of planning and monitoring activities during test performance are classi- fied as generation measures.

Examples of performance-based (generation) measures of planning which have been used in previous re- search include the following:

the plans students g&erate at the beginning of an activity (Marshall, 19931, traces or observable evidence of planning at points in task per- formance where planning is likely to occur (Howard-Rose & Winne, 19931, the proportion of time students devote to planning versus exe- cution during task performance (Chi et al., 1982).

Examples of performance-based measures of monitoring employed in previous research include the follow- ing:

1. the extent to which students look back over elements of their solutions (Garner & Reis, 1981),

2. the extent to which students are aware of inadequate in- structions for tasks (Markman, 19791,

3. the relative amount of time stu-

Fall 1995 33

Page 6: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

dents devote to preparing for assessment tasks with different cognitive demands (Wagner & Sternberg, 1987).

Snow and Jackson (1993) recom- mend that assessment of any metacognitive construct should be based on a battery of measures that converge on the construct. They also point out that self-report data may not accurately reflect students’ ac- tual use of metacognitive strategies during a learning or assessment task; therefore, at the very least, the more easily gathered self-report data should initially be validated against performance-based measures. The most unobtrusive way to gather data on which to base the development of generation measures of planning and monitoring may be to videotape students as they perform assessment tasks. The videotapes could also be used to prompt students for more situation-specific verbal reports of their engagement in planning and monitoring.

Multiple Measures of Motivational Constructs In this framework, perceived self-ef- ficacy, perceived task difficulty, and perceived task attraction may influ- ence effort and persistence during task performance and, therefore, should be estimated as additional sources of variation in task perfor- mance. Numerous interview sched- ules and questionnaires have been developed to tap perceptionslatti- tudedbeliefs. Sometimes students are presented with task scenarios and asked how they would rate their ability to do well on the tasks or how difficult the tasks seem. Most of the time, students are asked to rate how well a particular statement reflects their beliefs about content areas or task types. To measure perceived self-efficacy, Bandura (1986) had stu- dents rate how well they thought they would do on particular tasks by selecting a number on a confidence scale from 0 to 100. Boekaerts (1987) has developed pre- and posttask in- struments that ask students to re- spond on a 4-point scale to items such as “DO you anticipate enjoying yourself while doing this task?” “How often can you succeed at this kind of task without help?” “How pleasant did you find this task?” “How difficult did you find this

task?” or “What sort of grade do you expect to get for this task?” to mea- sure variables such as perceived task attraction, perceived difficulty, and perceived competence.

If a test contains items that relate to different types of knowledge or re- quire different response formats, then perceptions about the sets of items that represent particular con- structs or formats should be differen- tiated. Thus, variation in perfor- mance across sets of items could be linked to differences in perceptions of those items and perceptions of one’s ability in relation to the items.

In terms of the format categories included in the construct-by-format matrix presented in Figure 2, ques- tionnaires in which students are asked to rate or select statements that reflect their perceptions are classified as selection format mea- sures of perceptions; questionnaires requiring open-ended written re- sponse, or oral interviews that ask questions about perceptions, are classified as explanation type mea- sures. Performance-based (genera- tion) measures that would reflect perceptions of self and task would focus on evidence of effort and per- sistence. One such measure might be the relative amount of time spent on correct and incorrect knowledge- testing items across multiple for- mats. If a student performed well on one format, but not on another for- mat targeting similar knowledge, then one potential source of the stu- dent’s poor performance might be a misperception of self or a mispercep- tion of the demands or attractive- ness of the task when presented in a particular format, leading the stu- dent to invest less effort (spend less time on the items), in turn leading to lower performance than the studen- t’s knowledge would warrant. Be- havioral (generation) measures of effort invested during assessment could also be used to validate selec- tion and explanation measures of perceptions.

Conclusion This theory-based framework con- tains a relatively short list of con- structs identified as critical com- ponents of problem-solving perfor- mance. Multiple assessment strate- gies were suggested for each construct to facilitate much-needed

construct validity research, which would in turn lead to a refinement of the framework itself and to more specific assessment design prescrip- tions.

The instructional consequence of applying the framework would be to target interventions at specific gaps in domain-specific knowledge, and/or metacognitive weaknesses, and/or maladaptive perceptions of self and tasks. More precise instructional in- terventions will result from more precise diagnosis of cognitive deficits likely to impede problem-solving performance. A battery of measures targeting specific constructs facili- tates more precise diagnosis than is possible when problem solving is treated as a global ability targeted by undifferentiated tasks.

The framework presented in this article is only one example of many possible frameworks that would in- corporate current theories of learn- ing and performance into the design of assessment. Whatever the theory, it should be articulated, and tasks or task components, as well as patterns of performance on them, should be explicitly linked to the theory, in order to design valid measures of the performance. 0 t herwise, inferences about what students know and are able to do will remain difficult to de- fend against criticism from many quarters.

Note This article was written while the au-

thor was a Project Director at the Na- tional Center for Research on Evaluation, Standards, and Student Testing (CRESST), UCLA. The work was supported by the U. S. Department of Education through the National Cen- ter for Research on Evaluation, Stan- dards, and Student Testing (Grant Number R117G10027; CFDA Catalog Number 84.117G, administered by the Ofice of Educational Research and Im- provement). I am especially grateful to Eva Baker, Joan Herman, and Noreen Webb for their facilitation and encour- agement of the project.

References Alexander, E! A., & Judy, J. E. (1988). The

interaction of domain-specific and strategic knowledge in academic per- formance. Review of Educational Re- search, 58,375404.

Anderson, J. R. (1993). Rules of the mind. Hillsdale, N J Erlbaum.

Baker, E. L., O’Neil, H. E, Jr., & Linn, R.

34 Educational Measurement: Issues and Practice

Page 7: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

L. (1993). Policy and validity prospects for performance-based assessment,. American Psychologist, 48(12), 1210-1218.

Bandura, A. (1986). Social foundations of thoughts and actions. Engelwood Cliffs, NJ: Prentice-Hall.

Bennett, R. (1993) In R. Bennett & W. Ward (Eds.), Construction versus choice in cognitive measurement (pp. 1-27). Hillsdale, N J Erlbaum.

Bodner, G. M. (1991). In M . U. Smith (Ed.), Toward a unified theory ofprob- lem solving: Views from the content do- mains (pp. 21-34). Hillsdale, N J Erlbaum.

Boekaerts, M. (1987). Situation-specific judgments of a learning task versus overall measures of motivational orientation. In E. De Corte, H. Lode- wikjs, R. Parmentier, & €? Span (Eds.), Learning and instruction: European research in an international context (Vol. 1, pp. 169-1791, Oxford, England: Wiley & Sons.

Campione, J. C., & Brown, A. L. (1990). Guided learning and transfer: Implica- tions for approaches to assessment. In N. Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge ac- quisition (pp. 141-172). Hillsdale, NJ: Erlbaum.

Chi, M. T. H., GIaser, R., & Rees, E. (1982). Expertise in‘problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7-75). Hillsdale, NJ: Erlbaum.

Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the de- velopment and use of performance as- sessments. Applied Measurement in Education, 4(4), 289-303.

Frederiksen, N., Mislevy, R. J., & Bejar, 1. I. (Eds.). (1993). Test theory for a new generation of tests. Hillsdale, NJ: Erl- baum.

Garner, R., & Reis, R. (1981). Monitoring and resolving comprehension obsta- cles: An investigation of spontaneous text lookbacks among upper grade good and poor comprehendem Read- ing Research Quarterly, 16,569-582.

Glaser, R. (1992). Expert knowledge and processes of thinking. In D. E Halpern (Ed.), Enhancing thinking skills in the sciences and mathematics (pp. 63-75). Hillsdale, N J Erlbaum.

Howard-Rose, D., & Winne, I? H. (1993). Measuring component and sets of cog- nitive processes in self-regulated learning. Journal of Educational Psy-

Larkin, J. (1983). The role of problem representation in physics. In D. Gen- tner & A. L. Stevens, Mental models (pp. 75-98). Hillsdale, N J Erlbaum.

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex performance-based

chology, 85(4), 591-604.

assessment: Expectations and valida- tion criteria. Educational Researcher, 20(8), 15-21.

Markman, E. M. (1979). Realizing that you don’t understand: Elementary school children’s awareness of incon- sistencies. Child Development, 50(3),

Marshall, S. I! (1993). Assessing schema knowledge. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test the- ory for a new generation of tests (pp. 155-180). Hillsdale, N J Erlbaum.

McGowan, I?, & Woyach, R. B. (1989). An international relations approach. In R. B. Woyach, & R. C. Remy (Eds.), Approaches to world studies: A hand- book for curriculum planners. Boston, 1MA: Allyn & Bacon.

Meichenbaum, D., Burland, S., Gruson, L., & Cameron, R. (1985). Metacogni- tive assessment. In S. R. Yasser (Ed.), The growth of reflection in children (pp. 3-30). New York: Academic.

Messick, S. (1993). Trait equivalence as construct validity of score interpreta- tion across multiple methods of mea- surement. In R. Bennett & W Ward (Eds.), Construction versus choice in congitive measurement (pp. 61-74). Hillsdale, NJ: Erlbaum.

Messick, S. (1994). The interplay of evi- dence and consequences in the valida- tion of performance assessments. Educational Researcher, 23(2), 13-23.

O’Neil, H. E, Jr., Sugrue, B., Abedi, J., Baker, E. L., & Golan, S. (1992). Final report of experimental studies on moti- vation and NAEP test performance (Report to NCES, Contract #RS90159001). Los Angeles: Univer- sity of California, Center for Research on Evaluation, Standards, and Student Testing.

Peterson, I! L., Swing, S. R., Braverman, M. T., & Buss, R. (1982). Students’ ap- titudes and their reports of cognitive processing during instruction. Jour- nal of Educational Psychology, 74, 535-547.

Pintrich, F! R., & De Groot, E. (1990). Motivational and self-regulated learn- ing components of classroom academic performance. Journal of Educational Psychology, 82,3340.

Polya, G. (1945). How to solve it. Prince- ton, NJ: Princeton University Press.

Salomon, G. (1984). Television is “easy” and print is “tough:” The differential investment of mental effort in learn- ing as a function of perceptions and at- tributions. Journal of Educational

Schoenfeld, A. H. (1985). Mathematical problem solving. San Diego: Academic.

Shavelson, R. J., Baxter, G. F!, & Gao, X. (1993). Sampling variability of perfor- mance assessments. Journal of Educa- tional Measurement, 30,215-232.

643-655.

Psychology, 76(4), 647-658.

Silver, E. A. (Ed.). (1985). Teaching and learning mathematical problem solv- ing: Multiple research perspectives. Hillsdale, N J Erlbaum.

Smith, M. U. (Ed.). (1991). Toward a unified theory of problem solving: Views from the content domains. Hills- dale, N J Erlbaum.

Continued on page 36

Assessment Reform Continued from page 10 dads-driven assessment developers will not link the assessment tasks to a specific curriculum. Interestingly, these are the same two criticisms that assessment reforms direct against the existing standardized tests. Note

I thank Veronica Nitko for her helpful comments and suggestions on earlier drafts of this article. This article is based on the author’s presidential address pre- sented on April 6, 1994, at the Annual Meeting of the National Council on Mea- surement in Education in New Orleans, LA. References Linn, R. L. (1992). Educational assess-

ment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis, 15, 1-16.

Madaus, G. E (1988). The influence of testing on the curriculum. In L. N Tanner (Ed.), Critical issues in cur- riculum (Eighty-eighth Yearbook of the National Society for the Study of Education, Part I, pp. 83-121). Chicago: University of Chicago Press.

Nitko, A. J. (1994, July). A model for curriculum-driven criterion-referenced and norm-referenced national exami- nations for certification and selection of students. A paper presented at the Association for the Study of Educa- tional Evaluations in Southern Africa’s Second International Confer- ence on Educational Evaluation and Assessment, Pretoria, South Africa.

Paulk, M. C., Weber, C. 8, Garcia, S. M., Chrissis, M., &Bush, M. (1993). Key practices of the capability maturity model, Version 1. I (CMUISEI-93- TR-25, ESC-TR-93-178). Pittsburgh, PA Carnegie Mellon University, Soft- ware Engineering Institute.

Posner, G. J. (1992). Analyzing the cur- riculum. New York: McGraw-Hill.

Smith, M. S., & O’Day, J. (1991). Sys- temic school reform. In S. H. Fuhrman & B. Malen (Eds.), The politics of cur- riculum and testing: The 1990 Year- book of the Politics of Education Association (pp. 233-267). Bristol, PA: Falmer.

Fall 1995 35

Page 8: A Theory-Based Framework for Assessing Domainl-Specific Problem-Solving Ability

1995 NCME Award for Outstanding Dissemination of Educational Measmment Concepts to the Public

NCME’s triennial award for dissemination of educational measurement concepts to the public was presented to the Admission and Guidance Services Division of The Col- lege Board, New York, for multimedia ma- terials designed to educate various audiences on the meaning of recentered PSATBMSQT and SAT I score scales. Gretchen W Rigol, Executive Director, ac- cepted the award at the NCME breakfast at the 1995 Annual Meeting in San Fran- cisco. A somewhat expanded description of The College Board’s work from the nomi- nating statement submitted by Dr. Bar- bara G. Dodd of the University of Texas at Austin follows.

Prior to the recentering of the SAT I scores in April of 1995, the Admission and Guidance Services Division of The College Board undertook the formidable task of ed- ucating the various users of the SAT I scores on the meaning of the new recen- tered scores. Dissemination of information began in the spring of 1994 with mailings to secondary school educators and admis- sions professionals of colleges and universi- ties worldwide. In the fall of 1994, the College Board regional staff began con- ducting a series of nationwide workshops to address region-specific concerns.

The College Board materials nominated for the 1995 NCME award include two sets of documents. One set of materials is tar- geted toward the concerns of secondary ed- ucators, while the second set of materials was developed for collegiate professionals. The brochures “Balancing the SAT Scales” (College Entrance Examination Board [CEEB], 1994b) and “Q and A on Balanc- ing the SAT Scales” (CEEB, 19940 have proved to be very useful in conveying the new recentered scale to both secondary and collegiate professionals. The informa- tion that is contained in these documents presents difficult psychometric concepts in a very nonthreatening way that is easily understood by persons with no formal training in psychometrics.

In addition to the documents mentioned above, another packet includes the ‘Admis- sion Officer’s Guide to SAT I Score Con- versions” (CEEB, 1994a), the “Counselor’s Guide to SAT I Score Conversions” (CEEB, 1994e1, and a score converter (CEEB, 19948) that allows the conversion of scores from the old scale to the new scale, and vice versa. The score converter is particularly useful for professionals working with indi- vidual students because it allows for quick conversion of individual scores. The packet also includes a video (e.g., CEEB, 1994d) to inform students of the benefits of the new score scale. There is also an electronic

guide (CEEB, 1994~) to recentering that explains (a) the rationale behind recenter- ing the SAT I scores, (b) the impact of re- centering on the PSATBMSQT and SAT 11: Subject Tests, and (c) answers to com- monly asked questions. In addition, the electronic guide converts individual as well as mean scores from original to recentered scores, and vice versa.

Overall, the set of materials represents a tremendous effort by The College Board to disseminate information about the recen- tered SAT scores that has been developed specifically for their various publics. A great deal of thought and effort went into the development of these materials and workshops. During the development phase of the materials, The College Board held regional focus group meetings that con- sisted of not only secondary educators and collegiate professionals but also persons from state education offices. The result of these efforts is a set of multimedia materi- als that are extremely well done and should lead to a fairly smooth transition from the old scale to the new scale begin- ning in the spring of 1995. The high qual- ity of the products and the magnitude of the distribution will definitely promote public understanding of the new recen- tered scale.

In addition to general information, the College Board is disseminating the findings of two research studies conducted indepen- dently by Vanderbilt University on the ef- fects of recentering on class trend data, admissions prediction, placement, and test validity. The first study, co-authored by Dr. Neil1 Sanders, Dean of Admission, and Dr. Greg Perfetto, Director of Special Projects (Perfetto & Sanders, 1995), describes the effects of recentering the SAT I score and the stability of the equivalence tables, pro- vided by the College Board for the pur- poses of converting from one scale to the other, and suggests how colleges could ap- proach the management of their trend and historical SAT score data during the tran- sitional years from the old scale to the re- centered scale. The study report appears in The Admission Strategist, a College Board publication, and is available as a reprint. The second study, also conducted by Per- fetto and Sanders (in press), will focus on the recentering of the SAT 11: Subject Tests, particularly the consequences for admission and placement decisions, as well as test validity, of the switch to the new scale. This study will also be published in 1995 in The Admission Strategist and in reprint form this fall.

For further specific information about the above and related College Board ser-

vices, contact: Gretchen W Rigol, Execu- tive Director, Admission and Guidance Ser- vices, The College Board, 45 Columbus Avenue, New York, New York 10023-6992.

References Perfetto, G., & Sanders, N. (1995). Recentering:

What does it really mean for admissions? The Admission Strategist, 22, 29-35.

Perfetto, G., & Sanders, N. (in press). [Untitled]. The Admission Strategist.

The College Entrance Examination Board. (1994a). Admission officer’s guide to SAT I score conversions [Booklet]. New York Au- thor.

The College Entrance Examination Board. (1994b). Balancing the SAT scales [Brochure]. New York: Author.

The College Entrance Examination Board. (1994~). Balancing the SAT scales: An elec- tronic guide to recentering [Computer diskette]. New York Author.

The College Entrance Examination Board. (1994d). Balancing the SAT scales: Media kit [Video included]. New York: Author.

The College Entrance Examination Board. (1994e). Counselor’s guide to SAT I score con- versions [Booklet]. New York Author.

The College Entrance Examination Board. (19949. Q and A on balancing the SAT scales [Brochure]. New York: Author.

The College Entrance Examination Board. (1994g). SAT I score converter [Slide rule]. New York Author.

Problem-Solving Ability Continued from page 35

Snow, R. E. (1993). Construct validity and constructed-response tests. In R. Bennett, & W Ward (Eds.), Construc- tion versus choice in cognitive mea- surement (pp. 45-60). Hillsdale, N J Erlbaum.

Snow, R. E., &Jackson, D. N., 111. (1993). Assessment of conative constructs for educational research and evaluation: A catalogue (CSE Tech. Rep. No. 354). Los Angeles: University of California, National Center for Research on Eval- uation, Standards, and Student Test- ing.

Wagner, R. K., & Sternberg, R. J. (1987). Executive control in reading compre- hension. In B. K. Britton, & S. M. Glynn (Eds.), Executive control process in reading (pp. 1-22). Hillsdale, NJ: Erlbaum.

Williams, R. G., & Haladyna, T. M. (1982). Logical operators for generat- ing intended questions (LOGIQ): A typoloa for higher level test items. In G. H. Roid, & T. M. Haladyna (Eds.), A technology for test-item writing (pp. 161-186). New York: Academic.

Educational Measurement: Issues and Practice 36