Assessment Portfolios as Opportunities for Teacher Learning

This article was downloaded by: [Heriot-Watt University]On: 07 October 2014, At: 14:27Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Educational AssessmentPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/heda20

Assessment Portfolios as Opportunities for TeacherLearningMaryl Gearhart a & Ellen Osmundson ba Center for the Assessment and Evaluation of Student Learning (CAESL), University ofCalifornia , Berkeleyb Center for the Assessment and Evaluation of Student Learning (CAESL), University ofCalifornia , Los AngelesPublished online: 06 Apr 2009.

To cite this article: Maryl Gearhart & Ellen Osmundson (2009) Assessment Portfolios as Opportunities for Teacher Learning,Educational Assessment, 14:1, 1-24, DOI: 10.1080/10627190902816108

To link to this article: http://dx.doi.org/10.1080/10627190902816108

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/heda20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10627190902816108

http://dx.doi.org/10.1080/10627190902816108

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Educational Assessment, 14:1–24, 2009

Copyright © Taylor & Francis Group, LLC

ISSN: 1062-7197 print/1532-6977 online

DOI: 10.1080/10627190902816108

Assessment Portfolios as Opportunitiesfor Teacher Learning

Maryl GearhartCenter for the Assessment and Evaluation of Student Learning (CAESL)

University of California, Berkeley

Ellen OsmundsonCenter for the Assessment and Evaluation of Student Learning (CAESL)

University of California, Los Angeles

This article is an analysis of the role of assessment portfolios in teacher learning. Over 18 months,

23 science teachers developed, implemented, and evaluated assessments to track student learning,

supported by portfolio tasks and resources, grade-level colleagues, and team facilitators. Evidence of

teacher learning included (a) portfolios of a sample of 10 teachers and (b) the cohort’s self-reports in

surveys and focus groups. Teachers gained understanding of assessment planning, tasks and scoring

guides, methods of analyzing patterns and trends, and use of evidence to guide instruction. Teachers

made uneven progress with technical aspects of assessment and curriculum-specific assessment.

Research is needed on ways to integrate the benefits of a generic portfolio with strategies to

strengthen specific areas of assessment expertise.

In this article we examine ways that assessment portfolios can support experienced science

teachers in their efforts to build assessment expertise. In the professional development program

we investigated, the Assessment Leadership Academy, portfolios provided science teachers

opportunities to learn new assessment concepts and practices and apply their learning to the

design and implementation of assessment plans for curriculum units. Although portfolios are

widely used in preservice and professional development as resources for teacher reflection

(Mansvelder-Longayroux, Beijard, Verloop, & Vermunt, 2007; Zeichner & Wray, 2001), evi-

dence of the role of portfolios in teacher learning about classroom assessment is limited (Taylor,

1997; Taylor & Nolen, 1996a).

Our article is organized in three sections. The introduction provides a description of the

Academy’s conceptual framework, the portfolio, and the strategies designed to support teachers’

uses of the portfolios. We also review prior research to set our investigation in the context of

what is already known about assessment-focused professional development. In the Findings

section, we report evidence of teacher learning from analyses of the portfolios as well as

Correspondence should be sent to Maryl Gearhart, Graduate School of Education, Tolman Hall MC 1670, University

of California, Berkeley, CA 94720-1670. E-mail: [email protected]

1

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4

2 GEARHART AND OSMUNDSON

teachers’ self-reports in surveys and focus groups. We conclude with reflection on the oppor-

tunities and constraints of a portfolio-based program for supporting the growth of teachers’

assessment expertise.

INTRODUCTION

Academy Program and Conceptual Framework

The Assessment Leadership Academy was an 18-month program in 2003–04 that engaged 23

experienced science teachers in the construction of assessment portfolios for their curriculum

units.1 Five districts in California selected K-12 teams for participation, and, at meetings

held three times a year, participants were reorganized as cross-district grade level teams to

work collaboratively on assessments for curriculum units at their grade level. The Academy’s

core goal was to build teachers’ capacities with curriculum-embedded formative assessments

that can be used at key points in a curriculum unit to inform subsequent instruction. With

recognition that formative assessment encompasses a wide range of important strategies and

tools (Atkin, Coffey, Moorthy, Sato, & Thibeault, 2005; Bell & Cowie, 2001; Black, Harrison,

Lee, Marshall, & Wiliam, 2003; Black & Wiliam, 1998; Stiggins, 2005; Wiliam, Lee, Harrison,

& Black, 2004), the professional development team chose to focus on paper–pencil assessments,

because written assessments are easily archived and transported in portfolios.

The Academy’s objectives for teacher learning are represented in detail in the program’s

conceptual framework, based on theory and research from both the psychometric (American

Educational Research Association, American Psychological Association, and National Council

on Measurement and Education, 1999; Brookhart, 2003; Shepard, 2001; Stiggins, 2005; Taylor

& Nolen, 1996b, 2004; M. Wilson & Sloane, 2000) and practitioner traditions (Atkins et al.,

2005; Black et al., 2003; Black & Wiliam, 1998; National Research Council, 2001a; Watson,

2000). The framework captures relationships between teachers’ understanding of assessment

concepts (Figure 1) and their skill with assessment practices (Figure 2).

The network of interconnected assessment concepts in Figure 1 is a modified version

of the assessment triangle in Knowing What Students Know (National Research Council,

2001b). The core idea is that quality classroom assessment requires alignment of: the goals

for student learning (including the alternative conceptions that students construct as they build

understanding of complex science ideas; National Research Council, 1996, 2001a, 2001b), tools

for gathering evidence, interpretation of the evidence, and uses of the information.2 The figure

1Five districts sent K-12 district teams consisting of several teachers and one administrator, typically a district

science or assessment specialist. Our research focused only on teachers. Gearhart et al. (2006) reported preliminary

findings based on an analysis of three teachers in the 1st year.2The ideas in the framework are simplified in relation to more comprehensive treatments of classroom assessment

(e.g., Stiggins, 2005; Taylor & Nolen, 2004). Omitted or backgrounded are certain technical ideas, students’ roles in

assessment, and assessment systems that coordinate formative and summative assessments. On the other hand, the idea

of “developmentally sound content” was more emphasized than in other assessment projects, because the Academy was

invested in helping teachers interpret student progress along a developmental continuum of understanding (Herman,

2005). For example, during the planning phase when Academy teachers were evaluating the quality of potential

assessments, teachers drafted a range of “expected student responses” to evaluate the capacity of the assessment to

provide information on the developmental range of understanding, while, in other settings, teachers are often advised

just to write out the correct answers when evaluating assessment items (Taylor & Nolen, 2004).

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4

ASSESSMENT PORTFOLIOS 3

FIGURE 1 Academy framework for important classroom assessment concepts.

includes subconcepts associated with these major ideas and double arrows between nodes to

represent alignment.3 Figure 2 represents classroom assessment practices embedded in a cycle

of continuous instructional improvement. Planning begins when Academy teachers identify

their learning goals for a science unit and develop an integrated instruction and assessment

plan (cf. Wiggins & McTighe, 2005). Implementation entails: repeated cycles of instruction;

assessment using a variety of assessment strategies (Stiggins, 2005); interpretation of evidence;

and use of information to guide teaching, learning, and further assessment. The bidirectional

arrows indicate ongoing formative assessment and instructional improvement throughout the

unit.

Assessment Portfolio

Aligned with the Academy’s conceptual framework, the Academy portfolio was designed as

a learning portfolio (Mansvelder-Longayroux et al., 2007; Wolf & Dietz, 1998) rather than

an evaluative portfolio for monitoring teacher performance. The portfolio provided a set of

semistructured tasks and resources that supported teachers as they designed, implemented,

3The figures merge several versions of the framework shared with teachers over 18 months as the framework

evolved in part through teacher input. Herman (2005) provides a detailed exposition of one version of the framework,

and DiRanna et al. (2008) introduce a modified version.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


FIGURE 2 Academy framework for classroom assessment practices integrated with instructional practices.

and evaluated assessments to track student learning and progress. Because the professional

development team viewed ongoing reflective practice as essential to the professional work

of teaching (Schön, 1983, 1987), the portfolios provided teachers opportunities for reflection

on their work and their learning. Over 18 months, teachers constructed portfolios for three

curriculum units.

The portfolio’s three sections4 corresponded to the planning, implementation, and revision

phases in Figure 2; tasks in each section required teachers to apply relevant assessment concepts

from Figure 1. Table 1 is an outline of the portfolio sections, tasks, and key assessment

concepts.5

4The Academy assessment portfolio differed from the preservice model developed by Taylor and Nolen in two

ways (Taylor, 1997; Taylor & Nolen, 1996a). First, it was not a context for feedback by the professional development

team; the Academy goal was to promote professional reflection and collaboration, and the team wanted to minimize

concerns about evaluation. Second, it was a more ambitious undertaking than Taylor and Nolen could accomplish

within a 10-week academic term: The Academy portfolio documented the design of unit assessments, implementation

of assessments, and evaluation/refinement of assessments, while Taylor and Nolen’s preservice portfolio contained just

a unit plan (although the plan was in some ways more comprehensive than the Academy’s).5The portfolio forms and tasks were modified twice over the 18 month Academy program. Information on the

evolution of the portfolio is available from the authors. DiRanna et al. (2008) introduced a further evolution of the

portfolio.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


TABLE 1

Academy Assessment Portfolio Organization, Tasks, and Resources

Section Context Tasks Resources

I. Planning 1–3 days with grade

level team in

Academy institutes

� Establish learning goals

with the Conceptual

Flow process� Select appropriate

sequence of assessments

using the RAIM forms� Refine assessments by

drafting expected student

responses

� Forms to guide assessment

selection and refinement� Models of Conceptual Flows

and RAIMs� Instructional materials

(learning goals, embedded

assessments)� Facilitator and team

colleagues

II. Interpretation

of student work

Independent � Develop criteria� Score student responses� Record scores in matrix,

and analyze patterns &

trends� Use evidence to plan

instruction and give

feedback to students

� Portfolio forms to guide

interpretation� Models of criteria and analysis� Teacher guide for the

curriculum unit

III. Tool revision 1–1 1

2days with grade

level team in

Academy institutes

� Evaluate and revise tasks

based on patterns in

student responses� Evaluate and revise

criteria based on

patterns in student

responses

� Portfolio forms to guide

assessment evaluation and

revision� Models of assessment revision� Facilitator and team

colleagues

Note. RAIM D Record of Assessments in Instructional Materials.

Section I contained the unit plan—a description of the learning goals and assessments to

measure those goals. Facilitated over 1 to 3 days by Academy staff, each grade-level team

began by specifying learning goals and representing the goals as a “conceptual flow.” Then,

using a portfolio form titled “Record of Assessments in Instructional Materials” (RAIM),

teams located and recorded the assessments in their units, and selected a series of assess-

ments aligned with key unit goals to track student progress. To evaluate and strengthen these

assessments, teams used RAIM prompts linked to Academy concepts, and a key strategy

was drafting Expected Student Responses to gauge the likelihood that the assessment would

elicit and measure the full range of student understanding of the targeted concept. Teams

then had the opportunity to refine assessments and criteria, or design their own assessments.

The resulting plans (preassessment, interim “juncture” assessments, and postassessment) in-

corporated both formative and summative assessment as key components of a quality sys-

tem. Teachers filed a copy of the team’s collaboratively constructed plan in their individual

portfolios.

Section II was devoted to interpretation of student work and use of information to guide

instruction, and teachers completed this section independently in their classrooms as they

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


implemented the assessments.6 The portfolio provided teachers with strategies for interpreting

student work: methods of constructing criteria by modifying expected student responses based

on patterns in the student work, procedures for scoring responses, ways to record scores and

qualitative notes, and methods of analyzing patterns and trends. Portfolio prompts reminded

teachers to document their strategies for interpretation, their inferences, and the ways they

used the information to give students feedback and revise instruction. Teachers archived the

assessments, copies of the student work, and documentation of their work in their portfolios.

Section III contained revisions of the assessments. After teachers implemented their units,

they reconvened in their cross-district grade level teams, and facilitators guided teams through

a 1- to 2-day process of evaluating and revising their assessments based on students’ responses

to the assessments. Reflective prompts linked to assessment concepts (Figure 1) guided teachers

as they evaluated and strengthened their assessments—alignment with learning goals, accuracy

of science content, and developmental appropriateness (i.e., capacity to assess the full range

of student understanding). Teachers filed a copy of the revised assessments in their individual

portfolios.

As outlined in Table 1, the extent and nature of Academy support varied for different

sections of the portfolio. Section I forms were skeletal, as most of the work of assessment

planning was facilitated. In Section II, detailed portfolio forms outlined step-by-step methods

for developing criteria and analyzing whole class data, whereas support for use of results was

limited to open-ended queries about instructional follow-up and feedback to students. Section

III provided a detailed tool for evaluating and strengthening the quality of assessments.

Prior Research on Professional Development: Setting the AcademyPortfolio Strategy in Context

While the Academy assessment portfolio was an innovation, other features of the Academy

program were based on best practices culled from existing research on professional development

(Birman, Desimone, Garet, Porter, & Yoon, 2001; Garet, Porter, Desimone, Birman & Yoon,

2001; Guskey, 2003; Hawley & Valli, 1999; Loucks-Horsley, Love, Stiles, Mundry & Hewson,

2003; National Center for Education Statistics, 2001; S. M. Wilson & Berne, 1999). First,

teachers’ opportunities to learn were collaborative and sustained; for 2 years, the Academy

supported professional communities both within the Academy and the participating school

districts, and the portfolio served as a critical resource that traveled across contexts, supporting

different kinds of teacher interaction and work. Second, teacher reflection on practice was

embedded throughout the portfolio and institute activities. Third, opportunities for teacher

learning were a balance of expert guidance and teacher autonomy; during the institutes, facil-

itators guided collaborative work on the portfolios, but teachers were individually responsible

for implementing the assessments. The Academy design was, however, weakly aligned with

current recommendations to build content knowledge for teaching (Ball, Hill, & Bass, 2005;

Hill & Ball, 2004; Weiss & Miller, 2006). Academy teachers certainly engaged in content-

rich reflection on learning goals, assessments, and student work. But, given the diversity of

curriculum units, the Academy was unable to organize targeted curriculum-specific experiences

6In addition, some teachers were visited once by a member of the PD team for on-site coaching of interpretation

of student work, and one institute meeting provided time for discussion of student work.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


for teachers to build knowledge of science, the ways that students learn specific science concepts

and processes, and ways to assess based on a developmental continuum of understanding.

Prior studies of assessment-focused professional development have shown that teachers can

gain assessment expertise through activities like those embedded in the Academy portfolio,

including clarifying learning goals, developing assessment tools, and interpreting and utilizing

evidence. The Academy portfolio’s particular focus on paper-pencil assessment tasks built on

research during the performance assessment movement in the 1990s when teachers collaborated

to refine benchmark performance tasks and scoring guides, and score student work and consider

instructional implications (e.g., Falk & Ort, 1998; Sheingold, Heller, & Paulukonis, 1995).

These opportunities had generally positive impact on teachers’ assessment and instructional

practices, but researchers also identified barriers to teacher learning, especially the weak

alignment of large-scale assessments with classroom curriculum (Aschbacher, 1999; Borko,

Mayfield, Marion, Flexer & Cumbo, 1997; Falk & Ort, 1998; Gearhart & Saxe, 2004; Goldberg

& Roswell, 1999–2000; Laguarda & Anderson, 1998). The Academy addressed the alignment

issue by engaging teachers in the design and use of assessments for their own curriculum units.

In this regard, the Academy portfolio’s emphasis on integration of curriculum and assessment

was consistent with recent efforts to embed quality assessment systems in science units to help

teachers track student progress and support student learning (Aschbacher & Alonzo, 2006;

Herman, Osmundson, Ayala, Schneider, & Timms, 2006; Nagashima, Osmundson & Herman,

2006; M. Wilson & Sloane, 2000; S. M. Wilson, 2004).

When we consider the Academy in relation to the projects just cited, the Academy’s mission

appears very ambitious. In other projects, teachers generally focused on developing assessment

knowledge and expertise for a limited number of tools, whereas the Academy’s goal was to

engage teachers in developing and implementing coherent assessment plans for entire curricu-

lum units. The Academy team was well aware that Academy teachers had limited experience

evaluating, refining, and using quality assessments, but they argued that, because many science

units lack quality assessments, teachers need to build the expertise to strengthen the assessments

in their instructional materials. The intended outcomes of the Academy portfolio strategy were

to strengthen teachers’ assessment expertise, produce portfolio archives of the process and the

products of assessment design and implementation, and support the emergence of professional

communities committed to the improvement of classroom assessment.

Study Purpose and Analytic Approach

This article is an analysis of what the cohort of Academy teachers learned about classroom

assessment from their work with their portfolios. The findings are organized in two sections—

learning about assessment tools, and learning about interpreting and using evidence—and our

analysis is based on portfolios as well as teachers’ self-reported learning. Portfolio analysis

focuses on changes over time in the assessment practices of teachers who completed at least

two portfolios. Analyses of teachers’ self-reported learning in surveys and focus groups serve

as triangulation of the portfolio analysis as well as enriched information about teachers’

understanding and application of assessment concepts and methods. Triangulation of our data

sources enables us to identify what Academy teachers learned from constructing a series of

assessment portfolios for their curriculum units. We conclude the article with reflections on the

opportunities and limitations of a generic assessment portfolio for teacher learning.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


METHOD

Participants

Twenty-three experienced science teachers from Grades 1 through 10 participated in the

Academy, and the total sample size for each of our measures varies from 19 to 21. Based

on responses to the initial preinstitute survey (N D 19), the cohort’s mean years of teaching

experience was 14.7 (SD D 12.68). The majority had completed coursework beyond their B.A.,

and half had earned their M.A. Most teachers had participated in professional development

programs, and more than half had attended or presented at meetings of the National Science

Teachers Association. Teachers generally perceived themselves as instructional experts; on a

scale of 1 (weak) to 5 (very strong), teachers rated themselves as strong in confidence in

teaching science (M D 4.58, SD D 0.88), knowledge/understanding of grade-level science

(M D 4.41, SD D 0.83), and knowledge/understanding of grade level science standards (M D

4.46, SD D .66). Teachers perceived their “knowledge of a wide variety of assessment strategies

and techniques” as moderately high (M D 4.19, SD D .73).

Data and Analysis

Portfolios: Evidence of changes in teachers’ assessment practices. We used quali-

tative methods to analyze evidence of growth over time in the quality of teachers’ assessment

practices documented in their Academy portfolios. Our analytic strategy evolved over three

phases of work.

We first reviewed portfolios to identify portfolios that had sufficient material for analysis.

Few portfolios contained complete work on all tasks in each section, leading us to establish a

modest definition of a Complete portfolio as one containing some material in each section, and a

Partially Complete portfolio as one containing material in the section completed independently

(II) and one of the sections completed collaboratively (I or III). Two researchers rated the

portfolios, and rare disagreements were resolved through discussion. We identified 10 teachers

who submitted a series of two or three portfolios rated as Partially Complete or Complete,

and we adopted this set as our evidence of growth in the cohort’s understandings and uses

of assessment. We consider the portfolios of these 10 teachers to be a reasonable estimate of

growth for two reasons. First, these teachers were distributed across Grades 1 to 9: 1st (1),

2nd (1), 3rd (1), 4th (1), 6th (1), 8th (3), 9th/10th (2); the only Academy grade levels missing

were Grades 5 and 7. Second, background descriptives for the portfolio sample were similar

to descriptives for the remaining teachers in the cohort.

In the second phase of analysis, three researchers reviewed two sample portfolio series

representing elementary (Grade 3) and middle school (Grade 8) to develop methods of docu-

menting growth (or stasis) in teachers’ understandings and uses of assessment. Through detailed

readings, we made marginal jottings and wrote memos on patterns (Maxwell, 1996), and then

worked collaboratively to prepare conceptually ordered matrices of evidence (Miles & Huber-

man, 1994) of change over time in our four targeted strands of analysis: assessment planning,

developing or refining assessment tools, interpreting students’ responses to assessments, and

using evidence to guide instruction and provide students feedback. Our goal was to construct

a scoring guide for rating all 10 teachers’ progress over time.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


In Phase 3, we piloted our scoring approach with additional portfolio series and found

that we were faced with weak comparability: Portfolios differed in grade level, curriculum

content, and embedded assessments; teachers differed in their decisions about the portfolio

tasks that were most important for their units or their personal learning goals; portfolio forms

and resources were revised somewhat each semester. We therefore returned to developing

conceptually ordered matrices for each portfolio series to capture each teacher’s growth. We

specified the evidence to be used for each strand of analysis in our matrices, and Gearhart

and three researchers documented patterns. For each series, one researcher prepared an initial

matrix, and a second researcher read the same series as well as the matrix, and confirmed

or challenged the matrix until both researchers agreed that the matrix captured the patterns.

That pair then drafted a summary memo identifying the predominant patterns—both growth

and absence of growth—and key sources of evidence for each. Osmundson then reviewed all

portfolios and matrices, and then together both authors summarized patterns of growth (or

stasis) in two ways. Common patterns were present in at least 5 of the 10 portfolio series, a

conservative criterion appropriate for portfolios quite diverse in content. Range of patterns was

a summary of different types of shifts over time.

Surveys and focus groups: Teachers’ perceptions of learning, supports, and barriers.

We used surveys and focus groups to collect evidence of interim and summative Academy

impact.

Evidence of interim impact was provided by a survey focused on classroom assessment

practices first administered when the Academy was initiated in August 2003 and again in

May 2004. Teachers rated the extent to which they implemented various assessment practices

on a scale from 1 (very limited extent) to 5 (great extent). Nineteen (of 23) teachers from

Grades 1 through 9 completed the survey on both occasions, and we used t tests to compare

responses over time. To help us interpret the quantitative findings, we used HyperResearch©

(Hesse-Biber, Dupuis, & Kinder, 1991) to capture themes in teachers’ comments on the

May ’04 survey; themes were reviewed by the researchers and the professional development

team.

Exit data collected in December 2004 included a survey as well as focus groups. For

the survey (N D 21), teachers rated their understanding of Academy assessment strate-

gies from 1 (none) 5 ( full), and we generated descriptive statistics for the responses;

teachers’ written comments were combined with focus group transcripts when analyzing

exit themes. For the exit focus groups, grade-level and district teams were distributed

across five groups. To structure discussion, we provided figures of the Academy frame-

work, and teachers identified assessment practices or concepts that they had strengthened

and those they needed to strengthen, and then explained their selections. Teachers then

recommended revisions in Academy goals, the portfolio, and strategies for supporting

teacher learning. Recordings were transcribed, and we combined exit survey comments

and focus group transcripts, and used Hyperqual to identify teachers’ perceptions of their

learning and the factors that influenced their learning. Gearhart completed all coding,

and then both authors compared the thematic analysis with the memos that focus group

facilitators submitted immediately following their focus groups as validation for the coded

patterns.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


FINDINGS

Patterns of cohort learning are reported in two parts that are aligned with the sections of the

portfolio. In Learning about Assessment Tools, we present findings on teachers’ progress with

planning coherent assessment systems and designing appropriate assessments. In Interpreting

Student Responses and Using Evidence to Guide Instruction, we report the ways that teachers

were learning to use their assessment tools. In each section, our analyses of evidence of

teacher learning in the portfolios is validated and contextualized with teachers’ self-reports of

their learning on surveys and in focus groups.

Learning About Assessment Tools: Planning a Coherent AssessmentSystem and Refining Specific Assessments

Our analyses of the portfolios and teachers’ self reports revealed that all teachers used the

Academy protocols and resources to plan their assessment systems and refine specific assess-

ment tools. Teachers generally made more progress learning to establish learning goals than

they did with the selection or development of assessment tools.

Planning a coherent assessment system: Establishing goals and selecting assess-

ments. We analyzed shifts over time in the organization and coherence of both the unit

learning goals represented in the “conceptual flows” and the assessment plans. Given the range

of grade levels and units in our portfolio sample, it was not possible to evaluate either the

quality or clarity of each learning goal or the capacity of each assessment to measure students’

progress toward a given goal.

When we examined the conceptual flows, we found that most grade-level teams shifted

toward a greater focus on big ideas by removing, adding, or reorganizing learning goals to focus

on what was most important for students to learn. For example, a middle school team added the

concept of density to their goals for a unit on plate tectonics when they recognized that students’

understanding of how matter in the earth’s crust shifts is based on understanding density.7

Another common shift was toward more coordinated relationships among big ideas and smaller

supporting concepts. Most teams increasingly represented conceptual relationships among unit

goals rather than as a list of sequential lesson topics. For example, in their first conceptual

flow for a unit on homeostasis, a high school team depicted regulatory systems as distinct

systems in the body without a connection to homeostasis, but in their third conceptual flow

for a repeated unit, the team highlighted the interconnected relationships between homeostasis

and regulatory mechanisms in the body. A middle school team reorganized goals for their

unit on heredity by introducing “pre-learning” opportunities for students to learn scientific

terminology, after noticing that their English Language Learner students could often identify

inherited characteristics but their descriptions lacked specificity, clarity, and academic language.

Paralleling organizational shifts in the conceptual flows, all of the teachers’ assessment plans

were more coherently organized in later portfolios. Assessment plans shifted from long lists of

possible assessments toward judicious selection of a few key assessments for tracking student

7Organizational shifts toward a clearer focus on big ideas were more evident when teachers revised an assessment

plan they had constructed for an earlier portfolio.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


TABLE 2

Planning Goals and Assessments:

Means and Standard Deviations for Survey Administered August 2003 and May 2004

Items August 2003 May 2004

To what extent do you:

Set specific goals for student progress? 3.84 .69 4.16 .77

Align your assessments with your learning goals? 4.00 .75 4.37 .76

Assess students’ prior knowledge? 3.84 .83 4.16 .83

Note. 1 (not at all), 3 (moderate extent), 5 (great extent). N D 19.

progress—a preassessment, one or more juncture assessments, and a postassessment.8 The

addition of a pretest in most of the assessment plans in the later portfolios was a particularly

noteworthy indicator of teachers’ progress, as very few curriculum units contained pretests.

Most of the later portfolios also showed evidence of teachers’ efforts to strengthen alignment

among their assessments. For example, when a middle school team discovered that students

were challenged by the graphing requirements of their unit on density, they added items to

each of their assessments to allow them to analyze how student understanding of graphing was

developing in relation to student understandings of density. Other examples include reposition-

ing brief, targeted assessments as juncture assessments and moving comprehensive assessments

to the conclusion of a unit for use as postassessments. Finally, as evidence of their growing

attention to alignment, in later portfolios most teachers depicted relationships between learning

goals and assessments in a single, usable document rather than in two separate documents

(conceptual flows and RAIMs).

In surveys and focus groups, teachers reported strengthening their use and understanding

of assessment planning and assessment refinement. After the first 9 months of the Academy

(Table 2), teachers generally reported more frequent efforts to set learning goals, align assess-

ments with goals, and include assessments of prior knowledge, although these trends were not

statistically significant. In their survey comments, most teachers praised the benefits of the

portfolio process that engaged them in planning, implementation, reflection, and revision of

assessments. For example, one teacher commented, “Before I would have believed that I was

very good at evaluating the alignment of assessments with assessment targets; it wasn’t until I

saw my results from the pretest (or lack of results) that I realized I wasn’t as good at this as I

originally thought.”

On the exit survey, teachers indicated generally strong understandings of many aspects of

the Academy portfolio assessment planning process: how to create conceptual flows of learning

goals (M D 4.67, SD D .66), from 1 (poor) to 3 (moderate), to 5 (excellent understanding),

use the conceptual flow to guide assessment decisions (M D 4.57, SD D .60), and select the

juncture assessments (M D 4.29, SD D .72). Teachers’ lower ratings for preparing the detailed

8In the first portfolio, teachers were asked to list all possible assessments before making selections for their

assessment plan; in later portfolios, that task was revised to focus teachers more directly on selection of targeted

assessments. Thus this pattern of change from comprehensive lists to targeted selection mirrors revision of the portfolio

tasks, but that revision was prompted by teachers’ requests for a more strategic approach to assessment planning guided

by the conceptual flow of learning goals.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


RAIM plan (M D 3.79, SD D .96) and using the RAIM to guide specific assessment decisions

(M D 3.95, SD D .86) suggests that teachers felt they had gaps in their understanding of the

process of developing assessment tools (a pattern we continue to examine shortly).

In their exit comments on the survey and in focus groups, teachers reported learning many

of the big ideas of assessment planning. One primary theme was the important role of the

conceptual flow as a coherent representation of learning goals, as this quote illustrates: “I

now focus on what students need to know in conjunction with the conceptual flow and not

just what I need to cover in the unit.” Regarding assessments, the importance of planning

formative assessments took on new importance. As one teacher explained, “Before I was doing

backwards design making my summative assessment ahead of time, but I wasn’t planning the

formative assessment ahead of time; in making the RAIM, I’ve already got all the formative

assessments identified.” Academy teachers became more committed to formative assessment,

but as we report next, many teachers felt they had uneven understanding of specific techniques

for selecting and strengthening assessments.

Learning how to refine specific assessments. We analyzed evidence of teachers’ efforts

to improve the quality of their assessment tools from all three portfolio sections—I. Revising

selected assessments and developing Expected Student Responses, II. Developing criteria, and

III. Revising assessments after completing the unit. We found mixed patterns of improvement.

All teams worked to improve the clarity of their assessments—for example, modifying the

size of figures, leaving more space for students to answer, refining directions and response

choices for clarity. Most teams worked on strengthening the alignment of assessments with

learning goals. For example, one elementary teacher revised instructions for a performance

item on pitch and volume, because her students were interpreting the investigation instructions

incorrectly, and therefore their conclusions about sound were not relevant to the targeted

concepts. Another teacher revised the mineral samples for an assessment of the characteristics of

rocks and minerals, when she discovered the students’ “kit misconception” that “all minerals are

white” because all minerals in the instructional kit were indeed white! A middle school teacher

replaced an open-ended essay task assessing students’ understanding of plate tectonics with a

set of short answer items that provided more targeted evidence about student understandings

of the characteristics, causes, and effects of shifting tectonic plates. A high school teacher

added an explanation question to his multiple-choice test on the periodic table to provide

information on student understanding of how the periodic table can be used to predict the

nature of elements.

Teachers also strengthened the quality of assessment criteria in a variety of ways. In about

half of the portfolio series, revisions in criteria were more in form than function; an eighth-grade

team, for example, added a middle level to their two-level holistic rubric for a performance item

on the properties of matter, but the added level was not aligned with the high and low levels.

In the remaining portfolios, teachers revised criteria to differentiate additional dimensions of

performance and levels of understanding. For example, one elementary teacher transformed

the publisher’s three-level holistic scoring guidelines (complete response, partially complete

response, and no responses or a response that doesn’t make sense) into a four-level scoring

guide containing two distinct conceptual dimensions: (a) student can accurately identify and

use tests to distinguish different types of minerals, and (b) student knows that minerals are the

basic elements that make up rocks and have properties that can be described.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


In surveys and focus groups, teachers reported insights about how to strengthen assessment

tools along with continued uncertainties. Interim findings (Table 3) showed that teachers had

come to view their assessments as lower quality than at the outset of the Academy; decreases

in ratings for two items on validity and accommodation were statistically significant (p <

.05), and the trend was the same for items on reliability and fairness. This unexpected pattern

suggests that what teachers were learning about quality assessment tools was making them more

critical of the tools they were using, and teachers’ survey comments support our interpretation.

On one hand, many teachers reported that they were learning to integrate new assessment

tools: pretests (“I have never given a pretest before”; “assessing students’ prior knowledge has

become a more formal process”), interim juncture assessments (“[Now I am] assessing what

students know at critical junctures”), and parallel pre- and posttests (“I have been using more

pre/post testing”). Many teachers also reported strengthening the quality of assessment tools—

for example, “evaluating the developmental appropriateness of an assessment,” “making sure

that the assessments I give measure what I intend them to,” and “thinking about the criteria by

first drafting expected student responses.” But many teachers also reported concerns about their

assessment tools. The dominant theme was weak alignment. Some teachers were concerned

about alignment of assessments and learning goals: “Developer-created assessments must be

checked and analyzed to determine if their questions are assessing the same objectives you are

looking for,” and “even in reform units with embedded assessments, the assessments did not

always assess the concepts we wanted to assess.” Other teachers were concerned about align-

ment between assessment and instruction—for example, “my pre and post tests didn’t match the

instruction, so next time I will revise them and the instruction,” and “the juncture assessment

asked questions about content the students haven’t learned yet, and we need to revise it.” Less

frequently mentioned were concerns about the quality of item types (“my multiple choice test

told me nothing about student thinking”) and the fairness of items (“student accessibility to

the question—the language, the vocabulary”), two aspects of assessment refinement that were

not supported by the portfolio.

Exit findings indicated that teachers had continued to grow in their understandings of

assessment refinement, but they were more confident with some aspects than others. On the

survey, teachers indicated moderate to high understanding of methods for refining pre- and

TABLE 3

Assessment Tools:



To what extent are your tools:

Based on strong science content? 3.89 .57 3.95 1.03

Valid for the reason you are using them

(measures what you thought it would)?

3.72 .58 3.11* 1.15

Reliable and accurate? 3.68 .58 3.26 1.20

Designed to accommodate learners with various needs? 3.68 .75 2.79* .98

Fair? 3.89 .68 3.42 1.07


*p � .05.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


postassessments (preassessments: M D 4.35, SD D .74; postassessments: M D 4.35, SD D

.67) from 1 (poor) to 3 (moderate) to 5 (excellent understanding), and moderate understanding

of the detailed work of clarifying concepts (M D 4.10, SD D .83), clarifying expected

student responses (M D 4.14, SD D .66), and developing a juncture assessment (M D 4.19,

SD D .68). Two dominant themes in teachers’ comments were the importance of assessment

refinement (e.g., tools may “need to be tweaked and fixed”) and the important role of student

responses in assessment refinement: “[What the Academy portfolio] added was refining tasks

based on evidence, not on how I feel”; “[if] the question isn’t really written correctly, [I’ve

learned that] you’re not going to get expected answers that you need from the students”; “I

understand that we need to look at student work to refine the criteria so the criteria represent

an accurate assessment of student learning.” Teachers recognized that quality assessment tools

should provide information on “what students think” and “specifically what the students don’t

understand.”

Thus, after completing two or three portfolios, Academy teachers exited the program with

a commitment to formative assessment and greater understanding of strategies for assessment

planning and assessment refinement. The work was challenging, and some teachers expressed

a desire for higher quality assessments and criteria embedded in their instructional materials.

As one teacher explained, “It’s hard to write good assessments—field testing shows unexpected

results; it’s an iterative cycle, and it’s time intensive—if assessment writers were more careful,

our jobs would be easier.” Another teacher commented frankly, “[If I had] set criteria [in the

materials], it would make my life easier.”

Learning to Use Assessments Results: Interpreting Student Responsesand Using Evidence to Guide Instruction

Section II of the portfolio focused on interpretation of student responses and use of assessments

results. The portfolio forms guided teachers through a series of steps: sort student work to

provide initial information on levels of performance; compare patterns with the “expected

student responses”; construct scoring criteria through an iterative process of refinement, score,

record scores, and analyze patterns and trends. The portfolio contained models of rubrics

and assessment records as well as suggested ways to analyze patterns and trends. Section

II concluded with space for teachers to document the ways they used assessment results to

provide students feedback and guide instruction. Our analyses revealed that teachers were

learning new methods for interpretation of student work and developing targeted strategies

for instructional improvement and feedback. However, patterns of teacher learning varied, and

growth in teachers’ understandings was complex and nonlinear.

Learning to interpret student work. All portfolio series shifted toward greater sophisti-

cation of interpretive techniques. In the first portfolios, most student work was either graded

or simply collected, and teachers’ inferences about student learning appeared to be based on

unsystematic reviews of student responses or on other sources such as class discussion and

informal observation. In later portfolios, teachers began to score with rubrics9 and explore

9Unfortunately we could not trace teachers’ growth with scoring techniques such as benchmarking or double-

scoring, because the portfolio did not ask teachers to document the scoring process.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


ways to chart and analyze scores, and, by the final portfolio, all 10 teachers in our sample

used or adapted Academy models to score, chart results, and analyze patterns and trends. Some

of their assessment records were quantitative (scores), some qualitative (content analyses of

responses), and some a hybrid (when teachers supplemented their scores with qualitative notes

on the content of student responses).10

The quality of teachers’ methods of analysis ranged in sophistication, as we describe in the

snapshots next.

� More sophisticated analytic methods. A few portfolios revealed methods of interpretation

that were specialized for specific purposes. A middle school teacher adapted the Academy

“hybrid” record by focusing her qualitative notes just on the low and medium student

responses to help her identify needs for further instruction. An elementary teacher devel-

oped a four-level rubric to determine what students understood about specific concepts,

and she used three methods to analyze results: pre–post test comparisons based on the

number correct and change in score, item/concept correlation by clustering items related

to each concept, and identification of concepts associated with the most frequently missed

items.� Basic methods of analysis. In about half the portfolio series, teachers used basic methods

of analysis even in their later portfolios. Whole class data were analyzed as class averages

or class distributions of total scores. Patterns were sometimes summarized as restatements

of the criteria; for example, one teacher reported that “some students correctly predicted

how changing the angle of the plane would impact the speed of descent of the water

drop,” a report that was a restatement of his criterion for a “high” score. Student–item

interactions were examined with formats that appeared to limit interpretation of patterns—

for example, just listing the items that each student answered correctly (e.g., “Jenn: 2, 4,

7, 8, 9; Santiago: 1, 2, 4, 7, 8, 9”).� Problematic methods. Two of the later portfolios contained methods that were unlikely to

yield accurate and valid interpretations. One teacher compared class means of total scores

on pre- and postassessments with no analysis of individual student progress or item-

student interactions, and another compared students’ L-M-H performance on assessments

that were not comparable.

Although there was a range in the quality of teachers’ methods of interpretation, the

portfolios revealed overall increases in teachers’ expertise with interpretation, and survey and

focus group findings were consistent with this pattern. After the first 9 months (Table 4),

teachers generally reported more frequent use of “sound interpretations,” and a primary theme

in teachers’ comments was their shift from grading student work toward “analyzing individual

student work” and “analyzing test results from the perspective of student understandings” using

Academy portfolio techniques. The lower mean frequency of interpreting work “according to a

developmental framework” suggests that some teachers felt unprepared to interpret conceptual

10We cannot determine whether the hybrid records reflected limitations in teachers’ capacities to construct scoring

guides or their growing insight that mixed methods can be efficient and targeted. Shepard (2001), for example, argued

that qualitative analysis of the responses scored at medium and lower levels is a flexible and feasible strategy for

classroom assessment.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


TABLE 4

Interpretation of Student Work:




Are you using your assessments to make sound interpretations? 3.68 .75 4.16** .83

Do you analyze individual work and responses for specific student

understandings?

4.22 1.06 4.32 .67

Do you evaluate students’ ideas based on a developmental framework

of science understanding?

3.84 .76 3.74 1.24


**p < .07.

development, and indeed several teachers commented that their interpretations did not appear

to be consistent with the Academy notion of “developmentally appropriate” assessment.

On the exit survey, teachers expressed moderate to high understanding of analyzing whole

class sets of student work (M D 4.29, SD D .56) and comparing student performance over

time (pre–post: M D 4.48, SD D 81; prejuncture: M D 4.52, SD D .60; juncture-post: M D

4.52, SD D .60), ranging 1 (poor), 3 (moderate), and 5 (excellent understanding). Teachers’

investment in interpreting student thinking was a dominant theme in exit comments such as,

“I really care about what each student is saying, about what each group is thinking about an

idea”; “I’m now looking at everything, and I’m not even putting grades on anymore”; “I look

more carefully at what it is that [students] don’t know, and not so much, ‘oh they got it, they

didn’t get it.’ ” Another major theme was appreciation for the Academy portfolio methods of

recording scores and analyzing patterns, for example, “Having to make a chart and analyze that

really helped me”; “One of the things I came away with was seeing the trend in the class—if

a whole bunch of kids missed that, you know, the breakdown”; “Before it would be giving

them a grade and not really looking at the trends across my class and individual concepts they

might be lacking.” Minor themes focused on aspects of interpretation that were unsupported

in the portfolio—for example, fairness (“I have English language learners, and I’m uncertain

how to be fair and unbiased in my questions”) and reliability (“I don’t have that academic or

research background”).

Learning to use evidence to guide instructional improvement and provide students

feedback. In early portfolios, most teachers described generic strategies for follow-up in-

struction based on analysis of student work; strategies included giving students the correct

answers, reteaching, reviewing vocabulary, or modeling test-taking skills. Versions of these

generic methods continued in a few of the later portfolios, even with the addition of new

assessment tools; for example, a few teachers used preassessment results solely as a baseline

measure for pre–post comparisons while neglecting their value for instructional planning. But

in most of the later portfolios, teachers reported lesson-specific follow-up activities that merged

instruction with feedback and challenged students’ understandings of core concepts. Examples

included engaging students in scoring and revising their work, asking students to discuss

contrasting responses in small groups, and implementing a follow-up inquiry activity as an

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


additional learning opportunity. Portfolios also described instructional strategies matched to

students’ needs—for example, more didactic instruction for students with the least under-

standing. In one exemplary portfolio, an elementary teacher developed a scoring guide that

integrated scoring with her strategies for follow-up and feedback by specifying what additional

information and concepts students needed to learn to move to the next level of conceptual

understanding.

On the interim survey, teachers reported more frequent use of assessments to guide instruc-

tion (Table 5), although no increase in use of feedback. In their comments, they described both

generic approaches to follow-up (e.g., additional practice, reteach, readminister the assessment

after a review) and instructional techniques that targeted students’ understandings of specific

concepts (e.g., more differentiated instruction based on student needs). However, some teachers

reported feeling challenged in their efforts, for example, “I’ve made changes in instruction based

on student evidence but I am not sure they are the best changes—I at least try;” “now that I

know where they are, what do I do?” “[I’d like to] hear more about providing effective feedback

to students.”

At the conclusion of the Academy, teachers reported moderate to high understanding of

the uses of results from particular assessments (preassessment: M D 4.38, SD D .80 and

junctures: M D 4.40, SD D .75) from 1 (poor) to 3 (moderate) to 5 (excellent understanding),

and ways to use whole class information for specific purposes (revising instruction: M D 4.50,

SD D .51; revising instructional material: M D 4.25, SD D .79; providing feedback: M D

4.24, SD D .70; differentiating instruction: M D 3.86, SD D .73). The lower mean rating for

differentiating instruction may indicate teachers’ awareness that differentiation requires very

accurate information on each student’s understanding. A major theme in teachers’ comments

was the recognition that instructional follow-up is the purpose of formative assessment, as this

quote conveys: “I haven’t thought about that before—‘Okay, I teach it, I assess it, we move on;

those kids that didn’t catch it, okay, I’ll catch up with them later’ [but] that’s not how it works.”

Teachers described new ways that that they were using formative evidence as a resource for

instructional planning: “Now I look for patterns on the pretest—I get a current idea of the

unit based on student understanding rather than what the teacher thinks the unit should be”;

“The whole class analysis is a true guide for instruction—for differentiation, for grouping, for

a number of classroom issues”; “Looking at trends, really analyzing your students’ work, and,

based on that, ‘what do I do next?’ was really important.” A minor but related theme was

TABLE 5

Use of Information to Guide Instruction:




Are you using your assessments to guide instructional improvement? 4.21 .63 4.58* .51

Are you using your assessments to provide communication and feedback

to students regarding their performance?

3.84 .83 3.44 1.19


*p � .05.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


informal formative assessment as a new orientation to assessment practice—for example, “I

had a more fine tuned lens, and I would observe my class more [to] check really quickly so

that I could move on to address those misunderstandings right away.” Feedback to students

regarding their work and understandings was mentioned by a few teachers in their survey

comments: “Just last night I was grading the lab reports, and I found out all I’m doing is

writing questions to all the kids!” “I have students evaluating each other’s assessments and

giving feedback to one another.”

In sum, through their work on Academy portfolios, teachers replaced their prior practice of

grading student work with methods for analysis of student understanding; in concert, teachers

developed more targeted strategies for instructional improvement, and devoted some attention

to providing meaningful student feedback. These patterns of broad impact were balanced by

gaps in teachers’ expertise, such as interpretation of student progress along a developmental

continuum, fairness and reliability of scoring, and curriculum-specific ways to use information

to support student learning.

SUMMARY AND DISCUSSION

The Academy assessment portfolio was an innovative professional development tool designed to

guide science teachers toward deeper expertise with classroom assessment. Through a series of

three portfolios, Academy teachers gained experience with a process for designing assessment

plans for curriculum units, gathering and analyzing evidence of student understanding, and

using the information for instructional improvement. Evidence in teachers’ portfolios and in

their self-reports in surveys and focus groups revealed that teachers developed a commitment to

formative assessment and strengthened their expertise. As teachers recognized that curriculum

materials are not inflexible scripts, they adopted a professional stance toward materials as

revisable resources for teaching and assessment. But teachers exited the program with uneven

understanding of the technical aspects of assessment and curriculum-specific methods of using

assessments to improve instruction. Few teachers felt fully competent with all components of

the Academy’s vision for quality classroom assessment.

The first strand of our analysis focused on assessment planning. We found that teachers

learned to construct clearer depictions of relationships among unit learning goals, and identify

and align assessment points to track student progress. Teachers learned to revise tasks and

criteria—tighten alignment, clarify response expectations, and capture student understanding.

But teachers’ assessment revisions were sometimes limited to surface features, and teachers

were aware that they had more to learn about constructing an assessment system to track

student progress along a developmental continuum. At the same time, some teachers expressed

the desire for higher quality assessments embedded in their instructional materials, to reduce

the time and perceived expertise required to strengthen assessment tools.

Our second strand of analysis addressed teachers’ growth with interpretation of student

responses and use of the information to guide instruction and provide students feedback. Our

results showed that teachers gradually replaced their practices of merely collecting or grading

student work with strategies for analysis of student understanding and instructional follow-up.

Teachers refined criteria based on patterns in the student work, scored responses, organized

scores in a variety of records, and analyzed whole class patterns and trends. In concert, teachers’

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


uses of the evidence shifted from simply reteaching toward targeted strategies for feedback and

instructional improvement. But some teachers exited the program with methods of interpreting

student work that missed informative patterns, or methods of follow-up that were not well

aligned with assessment results. There were gaps in teachers’ understandings of interpretation

(such as scoring reliability and tracking student progress) as well as uses of assessment results

to differentiate instruction and provide feedback.

Discussion

We conclude with discussion of the role of the Academy portfolio in the growth of teachers’

assessment expertise. We argue that a generic portfolio can serve as a valuable resource in the

establishment of a K-12 professional community committed to the improvement of assessment.

There is a need, however, for additional specialized assessment coaching and support tailored to

teachers’ curriculum and their individual goals for learning. We also recommend exploring the

portfolio’s potential as a resource for helping teachers learn techniques for informal formative

assessment. We end with comments on the challenges of developing research-based trajectories

of teacher learning, and the need for further research.

The generic portfolio’s role in building a professional community. At Academy in-

stitutes, teachers were introduced to the assessment principles in the Academy framework,

and then grade-level teams applied those principles to the work of strengthening assessments

for their curriculum unit portfolios. Through these activities, Academy participants developed

shared knowledge of major assessment concepts, a shared technical language, and portable

portfolio examples that could serve as resources to sustain the ideas and the work. The generic

assessment portfolio played a pivotal role in building professional community.

The portfolio proved to be a flexible resource that allowed teaching professionals within

the community to focus on particular aspects of assessment based on their students’ needs and

their personal goals for learning. Although all teams completed the required portfolio work

(unit goal planning, tool development, and interpretation and use of evidence), some teams

took the initiative to work on additional aspects of assessment such as strengthening scoring

reliability, reducing bias in item design and scoring, and designing or interpreting sets of items

in one instrument. The range in teachers’ work was a benefit to the Academy. Within the shared

context of the assessment framework and portfolio, each team’s learning became a resource

for others.

What teachers learned about the portfolio process was itself a valuable outcome with

potential to seed new communities. When teachers completed the Academy program, they had

acquired a strategy for assessment planning and improvement that they could use to strengthen

their assessment practices collaboratively with non-Academy colleagues in their home districts.

The need to balance the generic portfolio with targeted support. The Academy

produced a cadre of professionals who had differing areas of expertise, and the portfolio’s

flexibility was one of several factors that contributed to this variation in teacher outcomes.

Teacher background was another factor. Teachers came to the Academy program with a range

of experience developing assessments and analyzing student work, knowledge of science, and

familiarity with particular curriculum units. Curriculum was a third factor—teachers were

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


working with curriculum units that contained markedly different assessments. At any given time,

one team might be revising multiple-choice items while another team was revising performance

tasks; even if two teams were revising assessments of the same type, there were differences in

the content and quality of the assessments in their instructional materials.

It is a reasonable conjecture that teachers’ progress with assessment will eventually require

their deep engagement with content and curriculum as well as specialized assistance with

the more technical aspects of assessment. Strengthening the validity of an assessment, for

example, requires teachers to analyze the soundness of the science content as well as the

capacity of students’ responses to the task and the scoring criteria to capture the range of

student performance and understanding in the domain. Developing a series of assessments to

track student progress requires a solid understanding of comparability of measures. Scoring

reliably requires practice with techniques such as benchmarking, rescoring, and double scoring.

Designing instructional methods for follow-up and feedback requires that teachers understand

content-specific strategies appropriate to students’ conceptual challenges. We recommend that

professional developers supplement the generic assessment portfolio with targeted and individ-

ualized methods of support. Coaches with curriculum and/or technical expertise could consult

with grade level teams in the institutes, and provide on-site and online coaching to help teachers

specialize their uses of assessment for particular curriculum units and their personal goals for

learning.

Support could also be embedded in instructional materials in the form of quality curriculum-

embedded assessments (e.g., Shavelson, Stanford Educational Assessment Laboratory, & Cur-

riculum Research & Development Group, 2005). Research-based assessments and scoring

guides can help teachers identify patterns of student learning and track progress systematically,

and indeed there is emerging evidence of the promise of these systems for teacher learning,

classroom practice, and student learning (Kennedy, Brown, Draney, & Wilson, 2005; M. Wilson

& Sloane, 2000). Provision of these tools could shift teachers’ attention from time-consuming

assessment development and refinement toward content-specific interpretation of students’

responses and design of instructional follow-up. Of course, teachers using these quality systems

still need assessment expertise to support student learning. Teachers must understand the

underlying concepts to refine embedded tools for their students and their assessment purposes

and to make appropriate inferences from assessment information. We believe that the Academy

portfolio has potential as a productive resource for teacher learning for curriculum units with

high quality assessments.

Building expertise with informal classroom assessment. The Academy professional

development team developed a portfolio model for written assessments that could be archived

and transported across contexts, but of course they recognized that classroom assessment

is far more complex. In addition to a more formal paper-and-pencil tool, teachers need to

gather ongoing information through informal methods such as questioning and observation,

and students should be deeply engaged in assessing their own and their peers’ progress in a

supportive community of learners. Prior research has produced solid evidence of the importance

of a coordinated system of informal and formal assessment (e.g., Black & Wiliam, 1998).

We believe that assessment portfolios have the potential to support teachers’ attention

to informal assessment. The heart of the Academy portfolio process was the design of an

assessment system, and informal assessments need to be coordinated with the very same

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


components—clearly specified learning goals, assessments to track students’ progress, strategies

for interpreting evidence, and formative use of evidence. With the support of their written port-

folios, teachers could develop related informal techniques. Work on informal teacher assessment

might focus on questions to ask in a whole class discussion, methods of documenting student

participation during science labs, or journal prompts to provide quick formative information

on student understanding. Work on students’ roles in assessment might address techniques to

ensure that students use criteria to monitor their work and their learning, the best ways to engage

students in constructing rubrics, and methods for scaffolding useful peer feedback. Because

documentation of informal assessment can be challenging, colleagues might observe in one

another’s classrooms and debrief their uses of informal techniques. Research and development

is needed to identify productive and feasible ways to coordinate the portfolio with work on

informal assessment.

The challenges of research on teacher learning. The Academy portfolios were rich

repositories of information, and we intended to use the portfolio series as evidence for trajecto-

ries of teacher learning. Our decision to put aside that effort was reluctant but necessary in the

face of portfolios that were not comparable across teachers or over time. Future research should

focus on the ways that teachers build assessment expertise when teachers are implementing the

same curriculum units. Within the context of shared curriculum, researchers will be better able

to develop research frameworks and tools to capture changes in teachers’ knowledge and use

of assessment over time and to evaluate the impact of teachers’ assessment practice on student

learning. Multiple investigations of patterns of teacher and student learning across a range of

curricula and grade levels will provide educators more complete understandings of the contexts

in which teachers learn to use assessment to support student learning.

Final Remark

Although further research on assessment portfolios is needed, our findings have clear implica-

tions for policy and practice. Assessment portfolios are valuable resources for teacher learning

when teachers collaborate in professional communities with the support of a comprehensive

assessment framework. Portfolios are tools for teacher learning that enable teachers to design

“assessments for learning” for their students.

ACKNOWLEDGMENTS

We thank the participating teachers, the professional development team, and our research team

for their contributions to the findings reported here. The professional development team was

co-directed by Kathy DiRanna (WestEd) and Craig Strang (Lawrence Hall of Science), and

the team consisted of Diane Carnahan, Karen Cerwin, and Jo Topps of WestEd, and Lynn

Barakos of Lawrence Hall of Science. Researchers (in addition to those listed as authors)

included Shaunna Clark, Joan Herman, Sam Nagashima, and Terry Vendlinski from UCLA, and

Diana Bernbaum, Jennifer Pfotenhauer, and Cheryl Schwab from UC Berkeley. Joan Herman

provided invaluable feedback on this article. This study was supported by the National Science

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


Foundation under a grant to WestEd for the Center for Assessment and Evaluation of Student

Learning (CAESL). Views expressed do not necessarily represent the views of the Foundation.

REFERENCES

American Educational Research Association, American Psychological Association, and National Council on Mea-

surement and Education. (1999). Standards for educational and psychological testing. Washington, DC: American

Educational Research Association.

Aschbacher, P. (1999). Helping educators to develop and use alternative assessments: Barriers and facilitators. Educa-

tional Policy, 8, 202–223.

Aschbacher, P., & Alonzo, A. (2006). Examining the utility of elementary science notebooks for formative assessment

purposes. Educational Assessment, 11, 279–303.

Atkin, J. M., Coffey, J. E., Moorthy, S., Sato, M., & Thibeault, M. (2005). Designing everyday assessment in the

science classroom. New York: Teachers College Press.

Ball, D. L., Hill, H. C., & Bass, H. (2005, Fall). Knowing mathematics for teaching: Who knows mathematics well

enough to teach third grade, and how can we decide? American Educator, pp. 14–23.

Bell, B., & Cowie, B. (2001). Formative assessment and science education. Dordrecht, the Netherlands: Kluwer

Academic.

Birman, B. F., Desimone, L., Garet, M. S., Porter, A. C., & Yoon, K. S. (2001). What makes professional development

effective? Results from a national sample of teachers. American Educational Research Journal, 38, 915–945.

Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning: putting it into practice.

Buckingham, England: Open University Press.

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5, 7–74.

Borko, H., Mayfield, V., Marion, S., Flexer, R., & Cumbo, K. (1997). Teachers’ developing ideas and practices about

mathematics performance assessment: Successes, stumbling blocks, and implications for professional development.

Teaching and Teacher Education, 13, 259–278.

Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses. Educational

Measurement: Issues and Practice, 22, 5–12.

DiRanna, K., Osmundson, E., Topps, J., Barakos, L., Gearhart, M., Cerwin, K., et al. (2008). Assessment-centered

teaching. Thousand Oaks, CA: Corwin.

Falk, B., & Ort, S. (1998, September). Sitting down to score: Teacher learning through assessment. Phi Delta Kappan,

80(1), 59–64.

Garet, M., Porter, A. C., Desimone, C., Birman, B. F., & Yoon, K. S. (2001). What makes professional development

effective? Results from a national sample of teachers. American Educational Research Journal, 38, 915–945.

Gearhart, M., Nagashima, S., Pfotenhauer, J., Clark, S., Schwab, C., Vendlinski, T., et al. (2006). Developing expertise

with classroom assessment in K–12 science: Learning to interpret student work. Educational Assessment, 11, 237–

263.

Gearhart, M., & Saxe, G. B. (2004). When teachers know what students know: Integrating assessment in elementary

mathematics. Theory Into Practice, 43, 304–313.

Goldberg, G. L., & Roswell, B. S. (1999–2000). From perception to practice: The impact of teachers’ scoring experience

on performance-based instruction and classroom assessment. Educational Assessment, 6, 257–290.

Guskey, T. R. (2003). Analyzing lists of the characteristics of effective professional development to promote visionary

leadership. NASP Bulletin, 87(637), 28–54.

Hawley, W. D., & Valli, L. (1999). The essentials of effective professional development. In L. Darling-Hammond & G.

Sykes (Eds.), Teaching as the learning profession: Handbook for policy and practice (pp. 127–150). San Francisco:

Jossey-Bass.

Herman, J. L. (2005, September). Using assessment to improve school and classroom learning: Critical ingredients.

Presentation at the Annual Conference of the Center for Research on Evaluation, Standards, and Student Testing,

University of California, Los Angeles.

Herman, J. L., Osmundson, E., Ayala, C., Schneider, S., & Timms, M. (2006, April). The nature and impact of

teachers’ formative assessment practices. Paper presented as part of the symposium, Building Science Assessment

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


Systems that Serve Accountability and Student Learning: The CAESL Model, at the annual meeting of the American

Educational Research Association, Montréal, Quebec, Canada.

Hesse-Biber, S., Dupuis, P., & Kinder, T. S. (1991). HyperRESEARCH, a computer program for the analysis of

qualitative data with an emphasis on hypothesis testing and multimedia analysis. Qualitative Sociology, 14, 289–

306.

Hill, H. C., & Ball, D. L. (2004). Learning mathematics for teaching: Results from California’s Mathematics Profes-

sional Development Institutes. Journal for Research in Mathematics Education, 35, 330–351.

Kennedy, C. A., Brown, N. J. S., Draney, K., & Wilson, M. (2005, April). Using progress variables and embedded

assessment to improve teaching and learning. Paper presented as part of the symposium Building Science Assessment

Systems that Serve Accountability and Student Learning: The CAESL Model, at the annual meeting of the American

Educational Research Association, Montréal, Quebec, Canada.

Laguarda, K. G., & Anderson, L. M. (1998). Partnerships for standards-based professional development: Final report

of the evaluation. Washington, DC: Policy Studies Associates.

Loucks-Horsley, S., Love, N., Stiles, K. E., Mundry, S., & Hewson, P. W. (2003). Designing professional development

for teachers of science and mathematics. Thousand Oaks, CA: Sage.

Mansvelder-Longayroux, D., Beijard, D., Verloop, N., & Vermunt, J. D. (2007). Functions of the learning portfolio in

student teachers’ learning process. Teachers College Record, 109, 126–159.

Maxwell, J. A. (1996). Qualitative research design: An interactive approach. Thousand Oaks, CA: Sage.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. (2nd Ed.), Thousand

Oaks, CA: Sage.

Nagashima, S. O., Osmundson, E. & Herman, J. L. (2006, April). Assessment in support of learning: Defining

effective practices and their precursors. Presentation at the Annual Meeting of the American Educational Researchers

Association, San Francisco, CA.

National Center for Education Statistics. (2001). Teacher preparation and professional development. Washington, DC:

Author.

National Research Council, National Committee on Science Education Standards. (1996). National science education

standards. Washington, DC: National Academy Press.

National Research Council, & Committee on Classroom Assessment and the National Science Education Standards,

Center for Education. (2001a). Classroom assessment and the National Science Education Standards (J. M. Atkin,

P. Black, & J. E. Coffey, Eds.). Washington, DC: National Academies Press.

National Research Council, & Committee on Classroom Assessment and the National Science Education Standards,

Center for Education. (2001b). Knowing what students know: The science and design of educational assessment

(J. Pellegrino, N. Chudowsky, & R. Glaser, Eds.). Washington, DC: National Academies Press.

Schön, D. A. (1983). The reflective practitioner: How professionals think in action. New York: Basic Books.

Schön, D. A. (1987). Educating the reflective practitioner. San Francisco: Jossey-Bass.

Shavelson, R., Stanford Educational Assessment Laboratory, & Curriculum Research & Development Group. (2005).

Embedding assessments in the FAST curriculum: The romance between curriculum and assessment (Final Report).

Palo Alto, CA: Stanford University.

Sheingold, K., Heller, J. I., & Paulukonis, S. T. (1995). Actively seeking evidence: Teacher change through assess-

ment development (Rep. No. MS #94-04). Princeton, NJ: Educational Testing Service, Center for Performance

Assessment.

Shepard, L. A. (2001). The role of classroom assessment in teaching and learning. In V. Richardson (Ed.), The handbook

of research on teaching (4th ed., pp. 1066–1101). Washington, DC: American Educational Research Association.

Stiggins, R. J. (2005). Student-involved assessment FOR learning (4th ed.). Upper Saddle River, NJ: Pearson Merrill

Prentice Hall.

Taylor, C. S. (1997). Using portfolios to teach teachers about assessment: How to survive. Educational Assessment, 4,

123–147.

Taylor, C. S., & Nolen, S. B. (1996a). A contextualized approach to teaching teachers about classroom-based

assessment. Educational Psychologist, 31, 77–88.

Taylor, C. S., & Nolen, S. B. (1996b). What does the psychometrician’s classroom look like? Reframing assessment

concepts in the context of learning. Educational Policy Analysis Archives, 4(17).

Taylor, C. S., & Nolen, S. B. (2004). Classroom assessment. Upper Saddle River, NJ: Prentice Hall.

Watson, A. (2000). Mathematics teachers acting as informal assessors: Practices, problems and recommendations.

Educational Studies in Mathematics, 41, 69–91.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4


Weiss, I. R., & Miller, B. (2006, October). Deepening teacher content knowledge for teaching: A review of the evidence.

Paper prepared for the Second Mathematics Science Partnership (MSP) Evaluation Summit, Minneapolis, MI.

Wiggins, G., & McTighe, J. (2005). Understanding by design. Baltimore: Association for Supervision and Curriculum

Development.

Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student

achievement. Assessment in Education, 11, 49–65.

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement

in Education, 13, 181–208.

Wilson, S. M. (2004). Student assessment as an opportunity to learn in and from one’s teaching practice. In M. Wilson

(Ed.), Towards coherence between classroom assessment and accountability: 103rd yearbook of the National Society

for the Study of Education, Part 2 (pp. 264–271). Chicago: University of Chicago Press.

Wilson, S. M., & Berne, J. (1999). Teacher learning and the acquisition of professional knowledge: An examination

of research on contemporary professional development. Review of Research in Education, 24, 173–209.

Wolf, K., & Dietz, M. (1998). Teaching portfolios: Purposes and possibilities. Teacher Education Quarterly, 25, 9–22.

Zeichner, K., & Wray S. (2001). The teaching portfolio in U.S. teacher education programs: What we know and what

we need to know. Teaching and Teacher Education 17, 613–621.

Dow

nloa

ded

by [

Her

iot-

Wat

t Uni

vers

ity]

at 1

4:27

07

Oct

ober

201

4

Documents

Assessment Portfolios as Opportunities for Teacher Learning