Upload
ellen
View
214
Download
1
Embed Size (px)
Citation preview
This article was downloaded by: [Heriot-Watt University]On: 07 October 2014, At: 14:27Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Educational AssessmentPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/heda20
Assessment Portfolios as Opportunities for TeacherLearningMaryl Gearhart a & Ellen Osmundson ba Center for the Assessment and Evaluation of Student Learning (CAESL), University ofCalifornia , Berkeleyb Center for the Assessment and Evaluation of Student Learning (CAESL), University ofCalifornia , Los AngelesPublished online: 06 Apr 2009.
To cite this article: Maryl Gearhart & Ellen Osmundson (2009) Assessment Portfolios as Opportunities for Teacher Learning,Educational Assessment, 14:1, 1-24, DOI: 10.1080/10627190902816108
To link to this article: http://dx.doi.org/10.1080/10627190902816108
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Educational Assessment, 14:1–24, 2009
Copyright © Taylor & Francis Group, LLC
ISSN: 1062-7197 print/1532-6977 online
DOI: 10.1080/10627190902816108
Assessment Portfolios as Opportunitiesfor Teacher Learning
Maryl GearhartCenter for the Assessment and Evaluation of Student Learning (CAESL)
University of California, Berkeley
Ellen OsmundsonCenter for the Assessment and Evaluation of Student Learning (CAESL)
University of California, Los Angeles
This article is an analysis of the role of assessment portfolios in teacher learning. Over 18 months,
23 science teachers developed, implemented, and evaluated assessments to track student learning,
supported by portfolio tasks and resources, grade-level colleagues, and team facilitators. Evidence of
teacher learning included (a) portfolios of a sample of 10 teachers and (b) the cohort’s self-reports in
surveys and focus groups. Teachers gained understanding of assessment planning, tasks and scoring
guides, methods of analyzing patterns and trends, and use of evidence to guide instruction. Teachers
made uneven progress with technical aspects of assessment and curriculum-specific assessment.
Research is needed on ways to integrate the benefits of a generic portfolio with strategies to
strengthen specific areas of assessment expertise.
In this article we examine ways that assessment portfolios can support experienced science
teachers in their efforts to build assessment expertise. In the professional development program
we investigated, the Assessment Leadership Academy, portfolios provided science teachers
opportunities to learn new assessment concepts and practices and apply their learning to the
design and implementation of assessment plans for curriculum units. Although portfolios are
widely used in preservice and professional development as resources for teacher reflection
(Mansvelder-Longayroux, Beijard, Verloop, & Vermunt, 2007; Zeichner & Wray, 2001), evi-
dence of the role of portfolios in teacher learning about classroom assessment is limited (Taylor,
1997; Taylor & Nolen, 1996a).
Our article is organized in three sections. The introduction provides a description of the
Academy’s conceptual framework, the portfolio, and the strategies designed to support teachers’
uses of the portfolios. We also review prior research to set our investigation in the context of
what is already known about assessment-focused professional development. In the Findings
section, we report evidence of teacher learning from analyses of the portfolios as well as
Correspondence should be sent to Maryl Gearhart, Graduate School of Education, Tolman Hall MC 1670, University
of California, Berkeley, CA 94720-1670. E-mail: [email protected]
1
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
2 GEARHART AND OSMUNDSON
teachers’ self-reports in surveys and focus groups. We conclude with reflection on the oppor-
tunities and constraints of a portfolio-based program for supporting the growth of teachers’
assessment expertise.
INTRODUCTION
Academy Program and Conceptual Framework
The Assessment Leadership Academy was an 18-month program in 2003–04 that engaged 23
experienced science teachers in the construction of assessment portfolios for their curriculum
units.1 Five districts in California selected K-12 teams for participation, and, at meetings
held three times a year, participants were reorganized as cross-district grade level teams to
work collaboratively on assessments for curriculum units at their grade level. The Academy’s
core goal was to build teachers’ capacities with curriculum-embedded formative assessments
that can be used at key points in a curriculum unit to inform subsequent instruction. With
recognition that formative assessment encompasses a wide range of important strategies and
tools (Atkin, Coffey, Moorthy, Sato, & Thibeault, 2005; Bell & Cowie, 2001; Black, Harrison,
Lee, Marshall, & Wiliam, 2003; Black & Wiliam, 1998; Stiggins, 2005; Wiliam, Lee, Harrison,
& Black, 2004), the professional development team chose to focus on paper–pencil assessments,
because written assessments are easily archived and transported in portfolios.
The Academy’s objectives for teacher learning are represented in detail in the program’s
conceptual framework, based on theory and research from both the psychometric (American
Educational Research Association, American Psychological Association, and National Council
on Measurement and Education, 1999; Brookhart, 2003; Shepard, 2001; Stiggins, 2005; Taylor
& Nolen, 1996b, 2004; M. Wilson & Sloane, 2000) and practitioner traditions (Atkins et al.,
2005; Black et al., 2003; Black & Wiliam, 1998; National Research Council, 2001a; Watson,
2000). The framework captures relationships between teachers’ understanding of assessment
concepts (Figure 1) and their skill with assessment practices (Figure 2).
The network of interconnected assessment concepts in Figure 1 is a modified version
of the assessment triangle in Knowing What Students Know (National Research Council,
2001b). The core idea is that quality classroom assessment requires alignment of: the goals
for student learning (including the alternative conceptions that students construct as they build
understanding of complex science ideas; National Research Council, 1996, 2001a, 2001b), tools
for gathering evidence, interpretation of the evidence, and uses of the information.2 The figure
1Five districts sent K-12 district teams consisting of several teachers and one administrator, typically a district
science or assessment specialist. Our research focused only on teachers. Gearhart et al. (2006) reported preliminary
findings based on an analysis of three teachers in the 1st year.2The ideas in the framework are simplified in relation to more comprehensive treatments of classroom assessment
(e.g., Stiggins, 2005; Taylor & Nolen, 2004). Omitted or backgrounded are certain technical ideas, students’ roles in
assessment, and assessment systems that coordinate formative and summative assessments. On the other hand, the idea
of “developmentally sound content” was more emphasized than in other assessment projects, because the Academy was
invested in helping teachers interpret student progress along a developmental continuum of understanding (Herman,
2005). For example, during the planning phase when Academy teachers were evaluating the quality of potential
assessments, teachers drafted a range of “expected student responses” to evaluate the capacity of the assessment to
provide information on the developmental range of understanding, while, in other settings, teachers are often advised
just to write out the correct answers when evaluating assessment items (Taylor & Nolen, 2004).
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 3
FIGURE 1 Academy framework for important classroom assessment concepts.
includes subconcepts associated with these major ideas and double arrows between nodes to
represent alignment.3 Figure 2 represents classroom assessment practices embedded in a cycle
of continuous instructional improvement. Planning begins when Academy teachers identify
their learning goals for a science unit and develop an integrated instruction and assessment
plan (cf. Wiggins & McTighe, 2005). Implementation entails: repeated cycles of instruction;
assessment using a variety of assessment strategies (Stiggins, 2005); interpretation of evidence;
and use of information to guide teaching, learning, and further assessment. The bidirectional
arrows indicate ongoing formative assessment and instructional improvement throughout the
unit.
Assessment Portfolio
Aligned with the Academy’s conceptual framework, the Academy portfolio was designed as
a learning portfolio (Mansvelder-Longayroux et al., 2007; Wolf & Dietz, 1998) rather than
an evaluative portfolio for monitoring teacher performance. The portfolio provided a set of
semistructured tasks and resources that supported teachers as they designed, implemented,
3The figures merge several versions of the framework shared with teachers over 18 months as the framework
evolved in part through teacher input. Herman (2005) provides a detailed exposition of one version of the framework,
and DiRanna et al. (2008) introduce a modified version.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
4 GEARHART AND OSMUNDSON
FIGURE 2 Academy framework for classroom assessment practices integrated with instructional practices.
and evaluated assessments to track student learning and progress. Because the professional
development team viewed ongoing reflective practice as essential to the professional work
of teaching (Schön, 1983, 1987), the portfolios provided teachers opportunities for reflection
on their work and their learning. Over 18 months, teachers constructed portfolios for three
curriculum units.
The portfolio’s three sections4 corresponded to the planning, implementation, and revision
phases in Figure 2; tasks in each section required teachers to apply relevant assessment concepts
from Figure 1. Table 1 is an outline of the portfolio sections, tasks, and key assessment
concepts.5
4The Academy assessment portfolio differed from the preservice model developed by Taylor and Nolen in two
ways (Taylor, 1997; Taylor & Nolen, 1996a). First, it was not a context for feedback by the professional development
team; the Academy goal was to promote professional reflection and collaboration, and the team wanted to minimize
concerns about evaluation. Second, it was a more ambitious undertaking than Taylor and Nolen could accomplish
within a 10-week academic term: The Academy portfolio documented the design of unit assessments, implementation
of assessments, and evaluation/refinement of assessments, while Taylor and Nolen’s preservice portfolio contained just
a unit plan (although the plan was in some ways more comprehensive than the Academy’s).5The portfolio forms and tasks were modified twice over the 18 month Academy program. Information on the
evolution of the portfolio is available from the authors. DiRanna et al. (2008) introduced a further evolution of the
portfolio.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 5
TABLE 1
Academy Assessment Portfolio Organization, Tasks, and Resources
Section Context Tasks Resources
I. Planning 1–3 days with grade
level team in
Academy institutes
� Establish learning goals
with the Conceptual
Flow process� Select appropriate
sequence of assessments
using the RAIM forms� Refine assessments by
drafting expected student
responses
� Forms to guide assessment
selection and refinement� Models of Conceptual Flows
and RAIMs� Instructional materials
(learning goals, embedded
assessments)� Facilitator and team
colleagues
II. Interpretation
of student work
Independent � Develop criteria� Score student responses� Record scores in matrix,
and analyze patterns &
trends� Use evidence to plan
instruction and give
feedback to students
� Portfolio forms to guide
interpretation� Models of criteria and analysis� Teacher guide for the
curriculum unit
III. Tool revision 1–1 1
2days with grade
level team in
Academy institutes
� Evaluate and revise tasks
based on patterns in
student responses� Evaluate and revise
criteria based on
patterns in student
responses
� Portfolio forms to guide
assessment evaluation and
revision� Models of assessment revision� Facilitator and team
colleagues
Note. RAIM D Record of Assessments in Instructional Materials.
Section I contained the unit plan—a description of the learning goals and assessments to
measure those goals. Facilitated over 1 to 3 days by Academy staff, each grade-level team
began by specifying learning goals and representing the goals as a “conceptual flow.” Then,
using a portfolio form titled “Record of Assessments in Instructional Materials” (RAIM),
teams located and recorded the assessments in their units, and selected a series of assess-
ments aligned with key unit goals to track student progress. To evaluate and strengthen these
assessments, teams used RAIM prompts linked to Academy concepts, and a key strategy
was drafting Expected Student Responses to gauge the likelihood that the assessment would
elicit and measure the full range of student understanding of the targeted concept. Teams
then had the opportunity to refine assessments and criteria, or design their own assessments.
The resulting plans (preassessment, interim “juncture” assessments, and postassessment) in-
corporated both formative and summative assessment as key components of a quality sys-
tem. Teachers filed a copy of the team’s collaboratively constructed plan in their individual
portfolios.
Section II was devoted to interpretation of student work and use of information to guide
instruction, and teachers completed this section independently in their classrooms as they
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
6 GEARHART AND OSMUNDSON
implemented the assessments.6 The portfolio provided teachers with strategies for interpreting
student work: methods of constructing criteria by modifying expected student responses based
on patterns in the student work, procedures for scoring responses, ways to record scores and
qualitative notes, and methods of analyzing patterns and trends. Portfolio prompts reminded
teachers to document their strategies for interpretation, their inferences, and the ways they
used the information to give students feedback and revise instruction. Teachers archived the
assessments, copies of the student work, and documentation of their work in their portfolios.
Section III contained revisions of the assessments. After teachers implemented their units,
they reconvened in their cross-district grade level teams, and facilitators guided teams through
a 1- to 2-day process of evaluating and revising their assessments based on students’ responses
to the assessments. Reflective prompts linked to assessment concepts (Figure 1) guided teachers
as they evaluated and strengthened their assessments—alignment with learning goals, accuracy
of science content, and developmental appropriateness (i.e., capacity to assess the full range
of student understanding). Teachers filed a copy of the revised assessments in their individual
portfolios.
As outlined in Table 1, the extent and nature of Academy support varied for different
sections of the portfolio. Section I forms were skeletal, as most of the work of assessment
planning was facilitated. In Section II, detailed portfolio forms outlined step-by-step methods
for developing criteria and analyzing whole class data, whereas support for use of results was
limited to open-ended queries about instructional follow-up and feedback to students. Section
III provided a detailed tool for evaluating and strengthening the quality of assessments.
Prior Research on Professional Development: Setting the AcademyPortfolio Strategy in Context
While the Academy assessment portfolio was an innovation, other features of the Academy
program were based on best practices culled from existing research on professional development
(Birman, Desimone, Garet, Porter, & Yoon, 2001; Garet, Porter, Desimone, Birman & Yoon,
2001; Guskey, 2003; Hawley & Valli, 1999; Loucks-Horsley, Love, Stiles, Mundry & Hewson,
2003; National Center for Education Statistics, 2001; S. M. Wilson & Berne, 1999). First,
teachers’ opportunities to learn were collaborative and sustained; for 2 years, the Academy
supported professional communities both within the Academy and the participating school
districts, and the portfolio served as a critical resource that traveled across contexts, supporting
different kinds of teacher interaction and work. Second, teacher reflection on practice was
embedded throughout the portfolio and institute activities. Third, opportunities for teacher
learning were a balance of expert guidance and teacher autonomy; during the institutes, facil-
itators guided collaborative work on the portfolios, but teachers were individually responsible
for implementing the assessments. The Academy design was, however, weakly aligned with
current recommendations to build content knowledge for teaching (Ball, Hill, & Bass, 2005;
Hill & Ball, 2004; Weiss & Miller, 2006). Academy teachers certainly engaged in content-
rich reflection on learning goals, assessments, and student work. But, given the diversity of
curriculum units, the Academy was unable to organize targeted curriculum-specific experiences
6In addition, some teachers were visited once by a member of the PD team for on-site coaching of interpretation
of student work, and one institute meeting provided time for discussion of student work.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 7
for teachers to build knowledge of science, the ways that students learn specific science concepts
and processes, and ways to assess based on a developmental continuum of understanding.
Prior studies of assessment-focused professional development have shown that teachers can
gain assessment expertise through activities like those embedded in the Academy portfolio,
including clarifying learning goals, developing assessment tools, and interpreting and utilizing
evidence. The Academy portfolio’s particular focus on paper-pencil assessment tasks built on
research during the performance assessment movement in the 1990s when teachers collaborated
to refine benchmark performance tasks and scoring guides, and score student work and consider
instructional implications (e.g., Falk & Ort, 1998; Sheingold, Heller, & Paulukonis, 1995).
These opportunities had generally positive impact on teachers’ assessment and instructional
practices, but researchers also identified barriers to teacher learning, especially the weak
alignment of large-scale assessments with classroom curriculum (Aschbacher, 1999; Borko,
Mayfield, Marion, Flexer & Cumbo, 1997; Falk & Ort, 1998; Gearhart & Saxe, 2004; Goldberg
& Roswell, 1999–2000; Laguarda & Anderson, 1998). The Academy addressed the alignment
issue by engaging teachers in the design and use of assessments for their own curriculum units.
In this regard, the Academy portfolio’s emphasis on integration of curriculum and assessment
was consistent with recent efforts to embed quality assessment systems in science units to help
teachers track student progress and support student learning (Aschbacher & Alonzo, 2006;
Herman, Osmundson, Ayala, Schneider, & Timms, 2006; Nagashima, Osmundson & Herman,
2006; M. Wilson & Sloane, 2000; S. M. Wilson, 2004).
When we consider the Academy in relation to the projects just cited, the Academy’s mission
appears very ambitious. In other projects, teachers generally focused on developing assessment
knowledge and expertise for a limited number of tools, whereas the Academy’s goal was to
engage teachers in developing and implementing coherent assessment plans for entire curricu-
lum units. The Academy team was well aware that Academy teachers had limited experience
evaluating, refining, and using quality assessments, but they argued that, because many science
units lack quality assessments, teachers need to build the expertise to strengthen the assessments
in their instructional materials. The intended outcomes of the Academy portfolio strategy were
to strengthen teachers’ assessment expertise, produce portfolio archives of the process and the
products of assessment design and implementation, and support the emergence of professional
communities committed to the improvement of classroom assessment.
Study Purpose and Analytic Approach
This article is an analysis of what the cohort of Academy teachers learned about classroom
assessment from their work with their portfolios. The findings are organized in two sections—
learning about assessment tools, and learning about interpreting and using evidence—and our
analysis is based on portfolios as well as teachers’ self-reported learning. Portfolio analysis
focuses on changes over time in the assessment practices of teachers who completed at least
two portfolios. Analyses of teachers’ self-reported learning in surveys and focus groups serve
as triangulation of the portfolio analysis as well as enriched information about teachers’
understanding and application of assessment concepts and methods. Triangulation of our data
sources enables us to identify what Academy teachers learned from constructing a series of
assessment portfolios for their curriculum units. We conclude the article with reflections on the
opportunities and limitations of a generic assessment portfolio for teacher learning.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
8 GEARHART AND OSMUNDSON
METHOD
Participants
Twenty-three experienced science teachers from Grades 1 through 10 participated in the
Academy, and the total sample size for each of our measures varies from 19 to 21. Based
on responses to the initial preinstitute survey (N D 19), the cohort’s mean years of teaching
experience was 14.7 (SD D 12.68). The majority had completed coursework beyond their B.A.,
and half had earned their M.A. Most teachers had participated in professional development
programs, and more than half had attended or presented at meetings of the National Science
Teachers Association. Teachers generally perceived themselves as instructional experts; on a
scale of 1 (weak) to 5 (very strong), teachers rated themselves as strong in confidence in
teaching science (M D 4.58, SD D 0.88), knowledge/understanding of grade-level science
(M D 4.41, SD D 0.83), and knowledge/understanding of grade level science standards (M D
4.46, SD D .66). Teachers perceived their “knowledge of a wide variety of assessment strategies
and techniques” as moderately high (M D 4.19, SD D .73).
Data and Analysis
Portfolios: Evidence of changes in teachers’ assessment practices. We used quali-
tative methods to analyze evidence of growth over time in the quality of teachers’ assessment
practices documented in their Academy portfolios. Our analytic strategy evolved over three
phases of work.
We first reviewed portfolios to identify portfolios that had sufficient material for analysis.
Few portfolios contained complete work on all tasks in each section, leading us to establish a
modest definition of a Complete portfolio as one containing some material in each section, and a
Partially Complete portfolio as one containing material in the section completed independently
(II) and one of the sections completed collaboratively (I or III). Two researchers rated the
portfolios, and rare disagreements were resolved through discussion. We identified 10 teachers
who submitted a series of two or three portfolios rated as Partially Complete or Complete,
and we adopted this set as our evidence of growth in the cohort’s understandings and uses
of assessment. We consider the portfolios of these 10 teachers to be a reasonable estimate of
growth for two reasons. First, these teachers were distributed across Grades 1 to 9: 1st (1),
2nd (1), 3rd (1), 4th (1), 6th (1), 8th (3), 9th/10th (2); the only Academy grade levels missing
were Grades 5 and 7. Second, background descriptives for the portfolio sample were similar
to descriptives for the remaining teachers in the cohort.
In the second phase of analysis, three researchers reviewed two sample portfolio series
representing elementary (Grade 3) and middle school (Grade 8) to develop methods of docu-
menting growth (or stasis) in teachers’ understandings and uses of assessment. Through detailed
readings, we made marginal jottings and wrote memos on patterns (Maxwell, 1996), and then
worked collaboratively to prepare conceptually ordered matrices of evidence (Miles & Huber-
man, 1994) of change over time in our four targeted strands of analysis: assessment planning,
developing or refining assessment tools, interpreting students’ responses to assessments, and
using evidence to guide instruction and provide students feedback. Our goal was to construct
a scoring guide for rating all 10 teachers’ progress over time.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 9
In Phase 3, we piloted our scoring approach with additional portfolio series and found
that we were faced with weak comparability: Portfolios differed in grade level, curriculum
content, and embedded assessments; teachers differed in their decisions about the portfolio
tasks that were most important for their units or their personal learning goals; portfolio forms
and resources were revised somewhat each semester. We therefore returned to developing
conceptually ordered matrices for each portfolio series to capture each teacher’s growth. We
specified the evidence to be used for each strand of analysis in our matrices, and Gearhart
and three researchers documented patterns. For each series, one researcher prepared an initial
matrix, and a second researcher read the same series as well as the matrix, and confirmed
or challenged the matrix until both researchers agreed that the matrix captured the patterns.
That pair then drafted a summary memo identifying the predominant patterns—both growth
and absence of growth—and key sources of evidence for each. Osmundson then reviewed all
portfolios and matrices, and then together both authors summarized patterns of growth (or
stasis) in two ways. Common patterns were present in at least 5 of the 10 portfolio series, a
conservative criterion appropriate for portfolios quite diverse in content. Range of patterns was
a summary of different types of shifts over time.
Surveys and focus groups: Teachers’ perceptions of learning, supports, and barriers.
We used surveys and focus groups to collect evidence of interim and summative Academy
impact.
Evidence of interim impact was provided by a survey focused on classroom assessment
practices first administered when the Academy was initiated in August 2003 and again in
May 2004. Teachers rated the extent to which they implemented various assessment practices
on a scale from 1 (very limited extent) to 5 (great extent). Nineteen (of 23) teachers from
Grades 1 through 9 completed the survey on both occasions, and we used t tests to compare
responses over time. To help us interpret the quantitative findings, we used HyperResearch©
(Hesse-Biber, Dupuis, & Kinder, 1991) to capture themes in teachers’ comments on the
May ’04 survey; themes were reviewed by the researchers and the professional development
team.
Exit data collected in December 2004 included a survey as well as focus groups. For
the survey (N D 21), teachers rated their understanding of Academy assessment strate-
gies from 1 (none) 5 ( full), and we generated descriptive statistics for the responses;
teachers’ written comments were combined with focus group transcripts when analyzing
exit themes. For the exit focus groups, grade-level and district teams were distributed
across five groups. To structure discussion, we provided figures of the Academy frame-
work, and teachers identified assessment practices or concepts that they had strengthened
and those they needed to strengthen, and then explained their selections. Teachers then
recommended revisions in Academy goals, the portfolio, and strategies for supporting
teacher learning. Recordings were transcribed, and we combined exit survey comments
and focus group transcripts, and used Hyperqual to identify teachers’ perceptions of their
learning and the factors that influenced their learning. Gearhart completed all coding,
and then both authors compared the thematic analysis with the memos that focus group
facilitators submitted immediately following their focus groups as validation for the coded
patterns.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
10 GEARHART AND OSMUNDSON
FINDINGS
Patterns of cohort learning are reported in two parts that are aligned with the sections of the
portfolio. In Learning about Assessment Tools, we present findings on teachers’ progress with
planning coherent assessment systems and designing appropriate assessments. In Interpreting
Student Responses and Using Evidence to Guide Instruction, we report the ways that teachers
were learning to use their assessment tools. In each section, our analyses of evidence of
teacher learning in the portfolios is validated and contextualized with teachers’ self-reports of
their learning on surveys and in focus groups.
Learning About Assessment Tools: Planning a Coherent AssessmentSystem and Refining Specific Assessments
Our analyses of the portfolios and teachers’ self reports revealed that all teachers used the
Academy protocols and resources to plan their assessment systems and refine specific assess-
ment tools. Teachers generally made more progress learning to establish learning goals than
they did with the selection or development of assessment tools.
Planning a coherent assessment system: Establishing goals and selecting assess-
ments. We analyzed shifts over time in the organization and coherence of both the unit
learning goals represented in the “conceptual flows” and the assessment plans. Given the range
of grade levels and units in our portfolio sample, it was not possible to evaluate either the
quality or clarity of each learning goal or the capacity of each assessment to measure students’
progress toward a given goal.
When we examined the conceptual flows, we found that most grade-level teams shifted
toward a greater focus on big ideas by removing, adding, or reorganizing learning goals to focus
on what was most important for students to learn. For example, a middle school team added the
concept of density to their goals for a unit on plate tectonics when they recognized that students’
understanding of how matter in the earth’s crust shifts is based on understanding density.7
Another common shift was toward more coordinated relationships among big ideas and smaller
supporting concepts. Most teams increasingly represented conceptual relationships among unit
goals rather than as a list of sequential lesson topics. For example, in their first conceptual
flow for a unit on homeostasis, a high school team depicted regulatory systems as distinct
systems in the body without a connection to homeostasis, but in their third conceptual flow
for a repeated unit, the team highlighted the interconnected relationships between homeostasis
and regulatory mechanisms in the body. A middle school team reorganized goals for their
unit on heredity by introducing “pre-learning” opportunities for students to learn scientific
terminology, after noticing that their English Language Learner students could often identify
inherited characteristics but their descriptions lacked specificity, clarity, and academic language.
Paralleling organizational shifts in the conceptual flows, all of the teachers’ assessment plans
were more coherently organized in later portfolios. Assessment plans shifted from long lists of
possible assessments toward judicious selection of a few key assessments for tracking student
7Organizational shifts toward a clearer focus on big ideas were more evident when teachers revised an assessment
plan they had constructed for an earlier portfolio.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 11
TABLE 2
Planning Goals and Assessments:
Means and Standard Deviations for Survey Administered August 2003 and May 2004
Items August 2003 May 2004
To what extent do you:
Set specific goals for student progress? 3.84 .69 4.16 .77
Align your assessments with your learning goals? 4.00 .75 4.37 .76
Assess students’ prior knowledge? 3.84 .83 4.16 .83
Note. 1 (not at all), 3 (moderate extent), 5 (great extent). N D 19.
progress—a preassessment, one or more juncture assessments, and a postassessment.8 The
addition of a pretest in most of the assessment plans in the later portfolios was a particularly
noteworthy indicator of teachers’ progress, as very few curriculum units contained pretests.
Most of the later portfolios also showed evidence of teachers’ efforts to strengthen alignment
among their assessments. For example, when a middle school team discovered that students
were challenged by the graphing requirements of their unit on density, they added items to
each of their assessments to allow them to analyze how student understanding of graphing was
developing in relation to student understandings of density. Other examples include reposition-
ing brief, targeted assessments as juncture assessments and moving comprehensive assessments
to the conclusion of a unit for use as postassessments. Finally, as evidence of their growing
attention to alignment, in later portfolios most teachers depicted relationships between learning
goals and assessments in a single, usable document rather than in two separate documents
(conceptual flows and RAIMs).
In surveys and focus groups, teachers reported strengthening their use and understanding
of assessment planning and assessment refinement. After the first 9 months of the Academy
(Table 2), teachers generally reported more frequent efforts to set learning goals, align assess-
ments with goals, and include assessments of prior knowledge, although these trends were not
statistically significant. In their survey comments, most teachers praised the benefits of the
portfolio process that engaged them in planning, implementation, reflection, and revision of
assessments. For example, one teacher commented, “Before I would have believed that I was
very good at evaluating the alignment of assessments with assessment targets; it wasn’t until I
saw my results from the pretest (or lack of results) that I realized I wasn’t as good at this as I
originally thought.”
On the exit survey, teachers indicated generally strong understandings of many aspects of
the Academy portfolio assessment planning process: how to create conceptual flows of learning
goals (M D 4.67, SD D .66), from 1 (poor) to 3 (moderate), to 5 (excellent understanding),
use the conceptual flow to guide assessment decisions (M D 4.57, SD D .60), and select the
juncture assessments (M D 4.29, SD D .72). Teachers’ lower ratings for preparing the detailed
8In the first portfolio, teachers were asked to list all possible assessments before making selections for their
assessment plan; in later portfolios, that task was revised to focus teachers more directly on selection of targeted
assessments. Thus this pattern of change from comprehensive lists to targeted selection mirrors revision of the portfolio
tasks, but that revision was prompted by teachers’ requests for a more strategic approach to assessment planning guided
by the conceptual flow of learning goals.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
12 GEARHART AND OSMUNDSON
RAIM plan (M D 3.79, SD D .96) and using the RAIM to guide specific assessment decisions
(M D 3.95, SD D .86) suggests that teachers felt they had gaps in their understanding of the
process of developing assessment tools (a pattern we continue to examine shortly).
In their exit comments on the survey and in focus groups, teachers reported learning many
of the big ideas of assessment planning. One primary theme was the important role of the
conceptual flow as a coherent representation of learning goals, as this quote illustrates: “I
now focus on what students need to know in conjunction with the conceptual flow and not
just what I need to cover in the unit.” Regarding assessments, the importance of planning
formative assessments took on new importance. As one teacher explained, “Before I was doing
backwards design making my summative assessment ahead of time, but I wasn’t planning the
formative assessment ahead of time; in making the RAIM, I’ve already got all the formative
assessments identified.” Academy teachers became more committed to formative assessment,
but as we report next, many teachers felt they had uneven understanding of specific techniques
for selecting and strengthening assessments.
Learning how to refine specific assessments. We analyzed evidence of teachers’ efforts
to improve the quality of their assessment tools from all three portfolio sections—I. Revising
selected assessments and developing Expected Student Responses, II. Developing criteria, and
III. Revising assessments after completing the unit. We found mixed patterns of improvement.
All teams worked to improve the clarity of their assessments—for example, modifying the
size of figures, leaving more space for students to answer, refining directions and response
choices for clarity. Most teams worked on strengthening the alignment of assessments with
learning goals. For example, one elementary teacher revised instructions for a performance
item on pitch and volume, because her students were interpreting the investigation instructions
incorrectly, and therefore their conclusions about sound were not relevant to the targeted
concepts. Another teacher revised the mineral samples for an assessment of the characteristics of
rocks and minerals, when she discovered the students’ “kit misconception” that “all minerals are
white” because all minerals in the instructional kit were indeed white! A middle school teacher
replaced an open-ended essay task assessing students’ understanding of plate tectonics with a
set of short answer items that provided more targeted evidence about student understandings
of the characteristics, causes, and effects of shifting tectonic plates. A high school teacher
added an explanation question to his multiple-choice test on the periodic table to provide
information on student understanding of how the periodic table can be used to predict the
nature of elements.
Teachers also strengthened the quality of assessment criteria in a variety of ways. In about
half of the portfolio series, revisions in criteria were more in form than function; an eighth-grade
team, for example, added a middle level to their two-level holistic rubric for a performance item
on the properties of matter, but the added level was not aligned with the high and low levels.
In the remaining portfolios, teachers revised criteria to differentiate additional dimensions of
performance and levels of understanding. For example, one elementary teacher transformed
the publisher’s three-level holistic scoring guidelines (complete response, partially complete
response, and no responses or a response that doesn’t make sense) into a four-level scoring
guide containing two distinct conceptual dimensions: (a) student can accurately identify and
use tests to distinguish different types of minerals, and (b) student knows that minerals are the
basic elements that make up rocks and have properties that can be described.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 13
In surveys and focus groups, teachers reported insights about how to strengthen assessment
tools along with continued uncertainties. Interim findings (Table 3) showed that teachers had
come to view their assessments as lower quality than at the outset of the Academy; decreases
in ratings for two items on validity and accommodation were statistically significant (p <
.05), and the trend was the same for items on reliability and fairness. This unexpected pattern
suggests that what teachers were learning about quality assessment tools was making them more
critical of the tools they were using, and teachers’ survey comments support our interpretation.
On one hand, many teachers reported that they were learning to integrate new assessment
tools: pretests (“I have never given a pretest before”; “assessing students’ prior knowledge has
become a more formal process”), interim juncture assessments (“[Now I am] assessing what
students know at critical junctures”), and parallel pre- and posttests (“I have been using more
pre/post testing”). Many teachers also reported strengthening the quality of assessment tools—
for example, “evaluating the developmental appropriateness of an assessment,” “making sure
that the assessments I give measure what I intend them to,” and “thinking about the criteria by
first drafting expected student responses.” But many teachers also reported concerns about their
assessment tools. The dominant theme was weak alignment. Some teachers were concerned
about alignment of assessments and learning goals: “Developer-created assessments must be
checked and analyzed to determine if their questions are assessing the same objectives you are
looking for,” and “even in reform units with embedded assessments, the assessments did not
always assess the concepts we wanted to assess.” Other teachers were concerned about align-
ment between assessment and instruction—for example, “my pre and post tests didn’t match the
instruction, so next time I will revise them and the instruction,” and “the juncture assessment
asked questions about content the students haven’t learned yet, and we need to revise it.” Less
frequently mentioned were concerns about the quality of item types (“my multiple choice test
told me nothing about student thinking”) and the fairness of items (“student accessibility to
the question—the language, the vocabulary”), two aspects of assessment refinement that were
not supported by the portfolio.
Exit findings indicated that teachers had continued to grow in their understandings of
assessment refinement, but they were more confident with some aspects than others. On the
survey, teachers indicated moderate to high understanding of methods for refining pre- and
TABLE 3
Assessment Tools:
Means and Standard Deviations for Survey Administered August 2003 and May 2004
Items August 2003 May 2004
To what extent are your tools:
Based on strong science content? 3.89 .57 3.95 1.03
Valid for the reason you are using them
(measures what you thought it would)?
3.72 .58 3.11* 1.15
Reliable and accurate? 3.68 .58 3.26 1.20
Designed to accommodate learners with various needs? 3.68 .75 2.79* .98
Fair? 3.89 .68 3.42 1.07
Note. 1 (not at all), 3 (moderate extent), 5 (great extent). N D 19.
*p � .05.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
14 GEARHART AND OSMUNDSON
postassessments (preassessments: M D 4.35, SD D .74; postassessments: M D 4.35, SD D
.67) from 1 (poor) to 3 (moderate) to 5 (excellent understanding), and moderate understanding
of the detailed work of clarifying concepts (M D 4.10, SD D .83), clarifying expected
student responses (M D 4.14, SD D .66), and developing a juncture assessment (M D 4.19,
SD D .68). Two dominant themes in teachers’ comments were the importance of assessment
refinement (e.g., tools may “need to be tweaked and fixed”) and the important role of student
responses in assessment refinement: “[What the Academy portfolio] added was refining tasks
based on evidence, not on how I feel”; “[if] the question isn’t really written correctly, [I’ve
learned that] you’re not going to get expected answers that you need from the students”; “I
understand that we need to look at student work to refine the criteria so the criteria represent
an accurate assessment of student learning.” Teachers recognized that quality assessment tools
should provide information on “what students think” and “specifically what the students don’t
understand.”
Thus, after completing two or three portfolios, Academy teachers exited the program with
a commitment to formative assessment and greater understanding of strategies for assessment
planning and assessment refinement. The work was challenging, and some teachers expressed
a desire for higher quality assessments and criteria embedded in their instructional materials.
As one teacher explained, “It’s hard to write good assessments—field testing shows unexpected
results; it’s an iterative cycle, and it’s time intensive—if assessment writers were more careful,
our jobs would be easier.” Another teacher commented frankly, “[If I had] set criteria [in the
materials], it would make my life easier.”
Learning to Use Assessments Results: Interpreting Student Responsesand Using Evidence to Guide Instruction
Section II of the portfolio focused on interpretation of student responses and use of assessments
results. The portfolio forms guided teachers through a series of steps: sort student work to
provide initial information on levels of performance; compare patterns with the “expected
student responses”; construct scoring criteria through an iterative process of refinement, score,
record scores, and analyze patterns and trends. The portfolio contained models of rubrics
and assessment records as well as suggested ways to analyze patterns and trends. Section
II concluded with space for teachers to document the ways they used assessment results to
provide students feedback and guide instruction. Our analyses revealed that teachers were
learning new methods for interpretation of student work and developing targeted strategies
for instructional improvement and feedback. However, patterns of teacher learning varied, and
growth in teachers’ understandings was complex and nonlinear.
Learning to interpret student work. All portfolio series shifted toward greater sophisti-
cation of interpretive techniques. In the first portfolios, most student work was either graded
or simply collected, and teachers’ inferences about student learning appeared to be based on
unsystematic reviews of student responses or on other sources such as class discussion and
informal observation. In later portfolios, teachers began to score with rubrics9 and explore
9Unfortunately we could not trace teachers’ growth with scoring techniques such as benchmarking or double-
scoring, because the portfolio did not ask teachers to document the scoring process.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 15
ways to chart and analyze scores, and, by the final portfolio, all 10 teachers in our sample
used or adapted Academy models to score, chart results, and analyze patterns and trends. Some
of their assessment records were quantitative (scores), some qualitative (content analyses of
responses), and some a hybrid (when teachers supplemented their scores with qualitative notes
on the content of student responses).10
The quality of teachers’ methods of analysis ranged in sophistication, as we describe in the
snapshots next.
� More sophisticated analytic methods. A few portfolios revealed methods of interpretation
that were specialized for specific purposes. A middle school teacher adapted the Academy
“hybrid” record by focusing her qualitative notes just on the low and medium student
responses to help her identify needs for further instruction. An elementary teacher devel-
oped a four-level rubric to determine what students understood about specific concepts,
and she used three methods to analyze results: pre–post test comparisons based on the
number correct and change in score, item/concept correlation by clustering items related
to each concept, and identification of concepts associated with the most frequently missed
items.� Basic methods of analysis. In about half the portfolio series, teachers used basic methods
of analysis even in their later portfolios. Whole class data were analyzed as class averages
or class distributions of total scores. Patterns were sometimes summarized as restatements
of the criteria; for example, one teacher reported that “some students correctly predicted
how changing the angle of the plane would impact the speed of descent of the water
drop,” a report that was a restatement of his criterion for a “high” score. Student–item
interactions were examined with formats that appeared to limit interpretation of patterns—
for example, just listing the items that each student answered correctly (e.g., “Jenn: 2, 4,
7, 8, 9; Santiago: 1, 2, 4, 7, 8, 9”).� Problematic methods. Two of the later portfolios contained methods that were unlikely to
yield accurate and valid interpretations. One teacher compared class means of total scores
on pre- and postassessments with no analysis of individual student progress or item-
student interactions, and another compared students’ L-M-H performance on assessments
that were not comparable.
Although there was a range in the quality of teachers’ methods of interpretation, the
portfolios revealed overall increases in teachers’ expertise with interpretation, and survey and
focus group findings were consistent with this pattern. After the first 9 months (Table 4),
teachers generally reported more frequent use of “sound interpretations,” and a primary theme
in teachers’ comments was their shift from grading student work toward “analyzing individual
student work” and “analyzing test results from the perspective of student understandings” using
Academy portfolio techniques. The lower mean frequency of interpreting work “according to a
developmental framework” suggests that some teachers felt unprepared to interpret conceptual
10We cannot determine whether the hybrid records reflected limitations in teachers’ capacities to construct scoring
guides or their growing insight that mixed methods can be efficient and targeted. Shepard (2001), for example, argued
that qualitative analysis of the responses scored at medium and lower levels is a flexible and feasible strategy for
classroom assessment.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
16 GEARHART AND OSMUNDSON
TABLE 4
Interpretation of Student Work:
Means and Standard Deviations for Survey Administered August 2003 and May 2004
Items August 2003 May 2004
To what extent are your tools:
Are you using your assessments to make sound interpretations? 3.68 .75 4.16** .83
Do you analyze individual work and responses for specific student
understandings?
4.22 1.06 4.32 .67
Do you evaluate students’ ideas based on a developmental framework
of science understanding?
3.84 .76 3.74 1.24
Note. 1 (not at all), 3 (moderate extent), 5 (great extent). N D 19.
**p < .07.
development, and indeed several teachers commented that their interpretations did not appear
to be consistent with the Academy notion of “developmentally appropriate” assessment.
On the exit survey, teachers expressed moderate to high understanding of analyzing whole
class sets of student work (M D 4.29, SD D .56) and comparing student performance over
time (pre–post: M D 4.48, SD D 81; prejuncture: M D 4.52, SD D .60; juncture-post: M D
4.52, SD D .60), ranging 1 (poor), 3 (moderate), and 5 (excellent understanding). Teachers’
investment in interpreting student thinking was a dominant theme in exit comments such as,
“I really care about what each student is saying, about what each group is thinking about an
idea”; “I’m now looking at everything, and I’m not even putting grades on anymore”; “I look
more carefully at what it is that [students] don’t know, and not so much, ‘oh they got it, they
didn’t get it.’ ” Another major theme was appreciation for the Academy portfolio methods of
recording scores and analyzing patterns, for example, “Having to make a chart and analyze that
really helped me”; “One of the things I came away with was seeing the trend in the class—if
a whole bunch of kids missed that, you know, the breakdown”; “Before it would be giving
them a grade and not really looking at the trends across my class and individual concepts they
might be lacking.” Minor themes focused on aspects of interpretation that were unsupported
in the portfolio—for example, fairness (“I have English language learners, and I’m uncertain
how to be fair and unbiased in my questions”) and reliability (“I don’t have that academic or
research background”).
Learning to use evidence to guide instructional improvement and provide students
feedback. In early portfolios, most teachers described generic strategies for follow-up in-
struction based on analysis of student work; strategies included giving students the correct
answers, reteaching, reviewing vocabulary, or modeling test-taking skills. Versions of these
generic methods continued in a few of the later portfolios, even with the addition of new
assessment tools; for example, a few teachers used preassessment results solely as a baseline
measure for pre–post comparisons while neglecting their value for instructional planning. But
in most of the later portfolios, teachers reported lesson-specific follow-up activities that merged
instruction with feedback and challenged students’ understandings of core concepts. Examples
included engaging students in scoring and revising their work, asking students to discuss
contrasting responses in small groups, and implementing a follow-up inquiry activity as an
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 17
additional learning opportunity. Portfolios also described instructional strategies matched to
students’ needs—for example, more didactic instruction for students with the least under-
standing. In one exemplary portfolio, an elementary teacher developed a scoring guide that
integrated scoring with her strategies for follow-up and feedback by specifying what additional
information and concepts students needed to learn to move to the next level of conceptual
understanding.
On the interim survey, teachers reported more frequent use of assessments to guide instruc-
tion (Table 5), although no increase in use of feedback. In their comments, they described both
generic approaches to follow-up (e.g., additional practice, reteach, readminister the assessment
after a review) and instructional techniques that targeted students’ understandings of specific
concepts (e.g., more differentiated instruction based on student needs). However, some teachers
reported feeling challenged in their efforts, for example, “I’ve made changes in instruction based
on student evidence but I am not sure they are the best changes—I at least try;” “now that I
know where they are, what do I do?” “[I’d like to] hear more about providing effective feedback
to students.”
At the conclusion of the Academy, teachers reported moderate to high understanding of
the uses of results from particular assessments (preassessment: M D 4.38, SD D .80 and
junctures: M D 4.40, SD D .75) from 1 (poor) to 3 (moderate) to 5 (excellent understanding),
and ways to use whole class information for specific purposes (revising instruction: M D 4.50,
SD D .51; revising instructional material: M D 4.25, SD D .79; providing feedback: M D
4.24, SD D .70; differentiating instruction: M D 3.86, SD D .73). The lower mean rating for
differentiating instruction may indicate teachers’ awareness that differentiation requires very
accurate information on each student’s understanding. A major theme in teachers’ comments
was the recognition that instructional follow-up is the purpose of formative assessment, as this
quote conveys: “I haven’t thought about that before—‘Okay, I teach it, I assess it, we move on;
those kids that didn’t catch it, okay, I’ll catch up with them later’ [but] that’s not how it works.”
Teachers described new ways that that they were using formative evidence as a resource for
instructional planning: “Now I look for patterns on the pretest—I get a current idea of the
unit based on student understanding rather than what the teacher thinks the unit should be”;
“The whole class analysis is a true guide for instruction—for differentiation, for grouping, for
a number of classroom issues”; “Looking at trends, really analyzing your students’ work, and,
based on that, ‘what do I do next?’ was really important.” A minor but related theme was
TABLE 5
Use of Information to Guide Instruction:
Means and Standard Deviations for Survey Administered August 2003 and May 2004
Items August 2003 May 2004
To what extent are your tools:
Are you using your assessments to guide instructional improvement? 4.21 .63 4.58* .51
Are you using your assessments to provide communication and feedback
to students regarding their performance?
3.84 .83 3.44 1.19
Note. 1 (not at all), 3 (moderate extent), 5 (great extent). N D 19.
*p � .05.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
18 GEARHART AND OSMUNDSON
informal formative assessment as a new orientation to assessment practice—for example, “I
had a more fine tuned lens, and I would observe my class more [to] check really quickly so
that I could move on to address those misunderstandings right away.” Feedback to students
regarding their work and understandings was mentioned by a few teachers in their survey
comments: “Just last night I was grading the lab reports, and I found out all I’m doing is
writing questions to all the kids!” “I have students evaluating each other’s assessments and
giving feedback to one another.”
In sum, through their work on Academy portfolios, teachers replaced their prior practice of
grading student work with methods for analysis of student understanding; in concert, teachers
developed more targeted strategies for instructional improvement, and devoted some attention
to providing meaningful student feedback. These patterns of broad impact were balanced by
gaps in teachers’ expertise, such as interpretation of student progress along a developmental
continuum, fairness and reliability of scoring, and curriculum-specific ways to use information
to support student learning.
SUMMARY AND DISCUSSION
The Academy assessment portfolio was an innovative professional development tool designed to
guide science teachers toward deeper expertise with classroom assessment. Through a series of
three portfolios, Academy teachers gained experience with a process for designing assessment
plans for curriculum units, gathering and analyzing evidence of student understanding, and
using the information for instructional improvement. Evidence in teachers’ portfolios and in
their self-reports in surveys and focus groups revealed that teachers developed a commitment to
formative assessment and strengthened their expertise. As teachers recognized that curriculum
materials are not inflexible scripts, they adopted a professional stance toward materials as
revisable resources for teaching and assessment. But teachers exited the program with uneven
understanding of the technical aspects of assessment and curriculum-specific methods of using
assessments to improve instruction. Few teachers felt fully competent with all components of
the Academy’s vision for quality classroom assessment.
The first strand of our analysis focused on assessment planning. We found that teachers
learned to construct clearer depictions of relationships among unit learning goals, and identify
and align assessment points to track student progress. Teachers learned to revise tasks and
criteria—tighten alignment, clarify response expectations, and capture student understanding.
But teachers’ assessment revisions were sometimes limited to surface features, and teachers
were aware that they had more to learn about constructing an assessment system to track
student progress along a developmental continuum. At the same time, some teachers expressed
the desire for higher quality assessments embedded in their instructional materials, to reduce
the time and perceived expertise required to strengthen assessment tools.
Our second strand of analysis addressed teachers’ growth with interpretation of student
responses and use of the information to guide instruction and provide students feedback. Our
results showed that teachers gradually replaced their practices of merely collecting or grading
student work with strategies for analysis of student understanding and instructional follow-up.
Teachers refined criteria based on patterns in the student work, scored responses, organized
scores in a variety of records, and analyzed whole class patterns and trends. In concert, teachers’
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 19
uses of the evidence shifted from simply reteaching toward targeted strategies for feedback and
instructional improvement. But some teachers exited the program with methods of interpreting
student work that missed informative patterns, or methods of follow-up that were not well
aligned with assessment results. There were gaps in teachers’ understandings of interpretation
(such as scoring reliability and tracking student progress) as well as uses of assessment results
to differentiate instruction and provide feedback.
Discussion
We conclude with discussion of the role of the Academy portfolio in the growth of teachers’
assessment expertise. We argue that a generic portfolio can serve as a valuable resource in the
establishment of a K-12 professional community committed to the improvement of assessment.
There is a need, however, for additional specialized assessment coaching and support tailored to
teachers’ curriculum and their individual goals for learning. We also recommend exploring the
portfolio’s potential as a resource for helping teachers learn techniques for informal formative
assessment. We end with comments on the challenges of developing research-based trajectories
of teacher learning, and the need for further research.
The generic portfolio’s role in building a professional community. At Academy in-
stitutes, teachers were introduced to the assessment principles in the Academy framework,
and then grade-level teams applied those principles to the work of strengthening assessments
for their curriculum unit portfolios. Through these activities, Academy participants developed
shared knowledge of major assessment concepts, a shared technical language, and portable
portfolio examples that could serve as resources to sustain the ideas and the work. The generic
assessment portfolio played a pivotal role in building professional community.
The portfolio proved to be a flexible resource that allowed teaching professionals within
the community to focus on particular aspects of assessment based on their students’ needs and
their personal goals for learning. Although all teams completed the required portfolio work
(unit goal planning, tool development, and interpretation and use of evidence), some teams
took the initiative to work on additional aspects of assessment such as strengthening scoring
reliability, reducing bias in item design and scoring, and designing or interpreting sets of items
in one instrument. The range in teachers’ work was a benefit to the Academy. Within the shared
context of the assessment framework and portfolio, each team’s learning became a resource
for others.
What teachers learned about the portfolio process was itself a valuable outcome with
potential to seed new communities. When teachers completed the Academy program, they had
acquired a strategy for assessment planning and improvement that they could use to strengthen
their assessment practices collaboratively with non-Academy colleagues in their home districts.
The need to balance the generic portfolio with targeted support. The Academy
produced a cadre of professionals who had differing areas of expertise, and the portfolio’s
flexibility was one of several factors that contributed to this variation in teacher outcomes.
Teacher background was another factor. Teachers came to the Academy program with a range
of experience developing assessments and analyzing student work, knowledge of science, and
familiarity with particular curriculum units. Curriculum was a third factor—teachers were
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
20 GEARHART AND OSMUNDSON
working with curriculum units that contained markedly different assessments. At any given time,
one team might be revising multiple-choice items while another team was revising performance
tasks; even if two teams were revising assessments of the same type, there were differences in
the content and quality of the assessments in their instructional materials.
It is a reasonable conjecture that teachers’ progress with assessment will eventually require
their deep engagement with content and curriculum as well as specialized assistance with
the more technical aspects of assessment. Strengthening the validity of an assessment, for
example, requires teachers to analyze the soundness of the science content as well as the
capacity of students’ responses to the task and the scoring criteria to capture the range of
student performance and understanding in the domain. Developing a series of assessments to
track student progress requires a solid understanding of comparability of measures. Scoring
reliably requires practice with techniques such as benchmarking, rescoring, and double scoring.
Designing instructional methods for follow-up and feedback requires that teachers understand
content-specific strategies appropriate to students’ conceptual challenges. We recommend that
professional developers supplement the generic assessment portfolio with targeted and individ-
ualized methods of support. Coaches with curriculum and/or technical expertise could consult
with grade level teams in the institutes, and provide on-site and online coaching to help teachers
specialize their uses of assessment for particular curriculum units and their personal goals for
learning.
Support could also be embedded in instructional materials in the form of quality curriculum-
embedded assessments (e.g., Shavelson, Stanford Educational Assessment Laboratory, & Cur-
riculum Research & Development Group, 2005). Research-based assessments and scoring
guides can help teachers identify patterns of student learning and track progress systematically,
and indeed there is emerging evidence of the promise of these systems for teacher learning,
classroom practice, and student learning (Kennedy, Brown, Draney, & Wilson, 2005; M. Wilson
& Sloane, 2000). Provision of these tools could shift teachers’ attention from time-consuming
assessment development and refinement toward content-specific interpretation of students’
responses and design of instructional follow-up. Of course, teachers using these quality systems
still need assessment expertise to support student learning. Teachers must understand the
underlying concepts to refine embedded tools for their students and their assessment purposes
and to make appropriate inferences from assessment information. We believe that the Academy
portfolio has potential as a productive resource for teacher learning for curriculum units with
high quality assessments.
Building expertise with informal classroom assessment. The Academy professional
development team developed a portfolio model for written assessments that could be archived
and transported across contexts, but of course they recognized that classroom assessment
is far more complex. In addition to a more formal paper-and-pencil tool, teachers need to
gather ongoing information through informal methods such as questioning and observation,
and students should be deeply engaged in assessing their own and their peers’ progress in a
supportive community of learners. Prior research has produced solid evidence of the importance
of a coordinated system of informal and formal assessment (e.g., Black & Wiliam, 1998).
We believe that assessment portfolios have the potential to support teachers’ attention
to informal assessment. The heart of the Academy portfolio process was the design of an
assessment system, and informal assessments need to be coordinated with the very same
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 21
components—clearly specified learning goals, assessments to track students’ progress, strategies
for interpreting evidence, and formative use of evidence. With the support of their written port-
folios, teachers could develop related informal techniques. Work on informal teacher assessment
might focus on questions to ask in a whole class discussion, methods of documenting student
participation during science labs, or journal prompts to provide quick formative information
on student understanding. Work on students’ roles in assessment might address techniques to
ensure that students use criteria to monitor their work and their learning, the best ways to engage
students in constructing rubrics, and methods for scaffolding useful peer feedback. Because
documentation of informal assessment can be challenging, colleagues might observe in one
another’s classrooms and debrief their uses of informal techniques. Research and development
is needed to identify productive and feasible ways to coordinate the portfolio with work on
informal assessment.
The challenges of research on teacher learning. The Academy portfolios were rich
repositories of information, and we intended to use the portfolio series as evidence for trajecto-
ries of teacher learning. Our decision to put aside that effort was reluctant but necessary in the
face of portfolios that were not comparable across teachers or over time. Future research should
focus on the ways that teachers build assessment expertise when teachers are implementing the
same curriculum units. Within the context of shared curriculum, researchers will be better able
to develop research frameworks and tools to capture changes in teachers’ knowledge and use
of assessment over time and to evaluate the impact of teachers’ assessment practice on student
learning. Multiple investigations of patterns of teacher and student learning across a range of
curricula and grade levels will provide educators more complete understandings of the contexts
in which teachers learn to use assessment to support student learning.
Final Remark
Although further research on assessment portfolios is needed, our findings have clear implica-
tions for policy and practice. Assessment portfolios are valuable resources for teacher learning
when teachers collaborate in professional communities with the support of a comprehensive
assessment framework. Portfolios are tools for teacher learning that enable teachers to design
“assessments for learning” for their students.
ACKNOWLEDGMENTS
We thank the participating teachers, the professional development team, and our research team
for their contributions to the findings reported here. The professional development team was
co-directed by Kathy DiRanna (WestEd) and Craig Strang (Lawrence Hall of Science), and
the team consisted of Diane Carnahan, Karen Cerwin, and Jo Topps of WestEd, and Lynn
Barakos of Lawrence Hall of Science. Researchers (in addition to those listed as authors)
included Shaunna Clark, Joan Herman, Sam Nagashima, and Terry Vendlinski from UCLA, and
Diana Bernbaum, Jennifer Pfotenhauer, and Cheryl Schwab from UC Berkeley. Joan Herman
provided invaluable feedback on this article. This study was supported by the National Science
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
22 GEARHART AND OSMUNDSON
Foundation under a grant to WestEd for the Center for Assessment and Evaluation of Student
Learning (CAESL). Views expressed do not necessarily represent the views of the Foundation.
REFERENCES
American Educational Research Association, American Psychological Association, and National Council on Mea-
surement and Education. (1999). Standards for educational and psychological testing. Washington, DC: American
Educational Research Association.
Aschbacher, P. (1999). Helping educators to develop and use alternative assessments: Barriers and facilitators. Educa-
tional Policy, 8, 202–223.
Aschbacher, P., & Alonzo, A. (2006). Examining the utility of elementary science notebooks for formative assessment
purposes. Educational Assessment, 11, 279–303.
Atkin, J. M., Coffey, J. E., Moorthy, S., Sato, M., & Thibeault, M. (2005). Designing everyday assessment in the
science classroom. New York: Teachers College Press.
Ball, D. L., Hill, H. C., & Bass, H. (2005, Fall). Knowing mathematics for teaching: Who knows mathematics well
enough to teach third grade, and how can we decide? American Educator, pp. 14–23.
Bell, B., & Cowie, B. (2001). Formative assessment and science education. Dordrecht, the Netherlands: Kluwer
Academic.
Birman, B. F., Desimone, L., Garet, M. S., Porter, A. C., & Yoon, K. S. (2001). What makes professional development
effective? Results from a national sample of teachers. American Educational Research Journal, 38, 915–945.
Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning: putting it into practice.
Buckingham, England: Open University Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5, 7–74.
Borko, H., Mayfield, V., Marion, S., Flexer, R., & Cumbo, K. (1997). Teachers’ developing ideas and practices about
mathematics performance assessment: Successes, stumbling blocks, and implications for professional development.
Teaching and Teacher Education, 13, 259–278.
Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses. Educational
Measurement: Issues and Practice, 22, 5–12.
DiRanna, K., Osmundson, E., Topps, J., Barakos, L., Gearhart, M., Cerwin, K., et al. (2008). Assessment-centered
teaching. Thousand Oaks, CA: Corwin.
Falk, B., & Ort, S. (1998, September). Sitting down to score: Teacher learning through assessment. Phi Delta Kappan,
80(1), 59–64.
Garet, M., Porter, A. C., Desimone, C., Birman, B. F., & Yoon, K. S. (2001). What makes professional development
effective? Results from a national sample of teachers. American Educational Research Journal, 38, 915–945.
Gearhart, M., Nagashima, S., Pfotenhauer, J., Clark, S., Schwab, C., Vendlinski, T., et al. (2006). Developing expertise
with classroom assessment in K–12 science: Learning to interpret student work. Educational Assessment, 11, 237–
263.
Gearhart, M., & Saxe, G. B. (2004). When teachers know what students know: Integrating assessment in elementary
mathematics. Theory Into Practice, 43, 304–313.
Goldberg, G. L., & Roswell, B. S. (1999–2000). From perception to practice: The impact of teachers’ scoring experience
on performance-based instruction and classroom assessment. Educational Assessment, 6, 257–290.
Guskey, T. R. (2003). Analyzing lists of the characteristics of effective professional development to promote visionary
leadership. NASP Bulletin, 87(637), 28–54.
Hawley, W. D., & Valli, L. (1999). The essentials of effective professional development. In L. Darling-Hammond & G.
Sykes (Eds.), Teaching as the learning profession: Handbook for policy and practice (pp. 127–150). San Francisco:
Jossey-Bass.
Herman, J. L. (2005, September). Using assessment to improve school and classroom learning: Critical ingredients.
Presentation at the Annual Conference of the Center for Research on Evaluation, Standards, and Student Testing,
University of California, Los Angeles.
Herman, J. L., Osmundson, E., Ayala, C., Schneider, S., & Timms, M. (2006, April). The nature and impact of
teachers’ formative assessment practices. Paper presented as part of the symposium, Building Science Assessment
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
ASSESSMENT PORTFOLIOS 23
Systems that Serve Accountability and Student Learning: The CAESL Model, at the annual meeting of the American
Educational Research Association, Montréal, Quebec, Canada.
Hesse-Biber, S., Dupuis, P., & Kinder, T. S. (1991). HyperRESEARCH, a computer program for the analysis of
qualitative data with an emphasis on hypothesis testing and multimedia analysis. Qualitative Sociology, 14, 289–
306.
Hill, H. C., & Ball, D. L. (2004). Learning mathematics for teaching: Results from California’s Mathematics Profes-
sional Development Institutes. Journal for Research in Mathematics Education, 35, 330–351.
Kennedy, C. A., Brown, N. J. S., Draney, K., & Wilson, M. (2005, April). Using progress variables and embedded
assessment to improve teaching and learning. Paper presented as part of the symposium Building Science Assessment
Systems that Serve Accountability and Student Learning: The CAESL Model, at the annual meeting of the American
Educational Research Association, Montréal, Quebec, Canada.
Laguarda, K. G., & Anderson, L. M. (1998). Partnerships for standards-based professional development: Final report
of the evaluation. Washington, DC: Policy Studies Associates.
Loucks-Horsley, S., Love, N., Stiles, K. E., Mundry, S., & Hewson, P. W. (2003). Designing professional development
for teachers of science and mathematics. Thousand Oaks, CA: Sage.
Mansvelder-Longayroux, D., Beijard, D., Verloop, N., & Vermunt, J. D. (2007). Functions of the learning portfolio in
student teachers’ learning process. Teachers College Record, 109, 126–159.
Maxwell, J. A. (1996). Qualitative research design: An interactive approach. Thousand Oaks, CA: Sage.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. (2nd Ed.), Thousand
Oaks, CA: Sage.
Nagashima, S. O., Osmundson, E. & Herman, J. L. (2006, April). Assessment in support of learning: Defining
effective practices and their precursors. Presentation at the Annual Meeting of the American Educational Researchers
Association, San Francisco, CA.
National Center for Education Statistics. (2001). Teacher preparation and professional development. Washington, DC:
Author.
National Research Council, National Committee on Science Education Standards. (1996). National science education
standards. Washington, DC: National Academy Press.
National Research Council, & Committee on Classroom Assessment and the National Science Education Standards,
Center for Education. (2001a). Classroom assessment and the National Science Education Standards (J. M. Atkin,
P. Black, & J. E. Coffey, Eds.). Washington, DC: National Academies Press.
National Research Council, & Committee on Classroom Assessment and the National Science Education Standards,
Center for Education. (2001b). Knowing what students know: The science and design of educational assessment
(J. Pellegrino, N. Chudowsky, & R. Glaser, Eds.). Washington, DC: National Academies Press.
Schön, D. A. (1983). The reflective practitioner: How professionals think in action. New York: Basic Books.
Schön, D. A. (1987). Educating the reflective practitioner. San Francisco: Jossey-Bass.
Shavelson, R., Stanford Educational Assessment Laboratory, & Curriculum Research & Development Group. (2005).
Embedding assessments in the FAST curriculum: The romance between curriculum and assessment (Final Report).
Palo Alto, CA: Stanford University.
Sheingold, K., Heller, J. I., & Paulukonis, S. T. (1995). Actively seeking evidence: Teacher change through assess-
ment development (Rep. No. MS #94-04). Princeton, NJ: Educational Testing Service, Center for Performance
Assessment.
Shepard, L. A. (2001). The role of classroom assessment in teaching and learning. In V. Richardson (Ed.), The handbook
of research on teaching (4th ed., pp. 1066–1101). Washington, DC: American Educational Research Association.
Stiggins, R. J. (2005). Student-involved assessment FOR learning (4th ed.). Upper Saddle River, NJ: Pearson Merrill
Prentice Hall.
Taylor, C. S. (1997). Using portfolios to teach teachers about assessment: How to survive. Educational Assessment, 4,
123–147.
Taylor, C. S., & Nolen, S. B. (1996a). A contextualized approach to teaching teachers about classroom-based
assessment. Educational Psychologist, 31, 77–88.
Taylor, C. S., & Nolen, S. B. (1996b). What does the psychometrician’s classroom look like? Reframing assessment
concepts in the context of learning. Educational Policy Analysis Archives, 4(17).
Taylor, C. S., & Nolen, S. B. (2004). Classroom assessment. Upper Saddle River, NJ: Prentice Hall.
Watson, A. (2000). Mathematics teachers acting as informal assessors: Practices, problems and recommendations.
Educational Studies in Mathematics, 41, 69–91.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4
24 GEARHART AND OSMUNDSON
Weiss, I. R., & Miller, B. (2006, October). Deepening teacher content knowledge for teaching: A review of the evidence.
Paper prepared for the Second Mathematics Science Partnership (MSP) Evaluation Summit, Minneapolis, MI.
Wiggins, G., & McTighe, J. (2005). Understanding by design. Baltimore: Association for Supervision and Curriculum
Development.
Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student
achievement. Assessment in Education, 11, 49–65.
Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement
in Education, 13, 181–208.
Wilson, S. M. (2004). Student assessment as an opportunity to learn in and from one’s teaching practice. In M. Wilson
(Ed.), Towards coherence between classroom assessment and accountability: 103rd yearbook of the National Society
for the Study of Education, Part 2 (pp. 264–271). Chicago: University of Chicago Press.
Wilson, S. M., & Berne, J. (1999). Teacher learning and the acquisition of professional knowledge: An examination
of research on contemporary professional development. Review of Research in Education, 24, 173–209.
Wolf, K., & Dietz, M. (1998). Teaching portfolios: Purposes and possibilities. Teacher Education Quarterly, 25, 9–22.
Zeichner, K., & Wray S. (2001). The teaching portfolio in U.S. teacher education programs: What we know and what
we need to know. Teaching and Teacher Education 17, 613–621.
Dow
nloa
ded
by [
Her
iot-
Wat
t Uni
vers
ity]
at 1
4:27
07
Oct
ober
201
4