A FRAMEWORKFOR IMPROVEMENT:Analyzing Performance Assessment Scoresfor Evidence-Based Teacher Preparation
Program Reforms
Kevin C. Bastian, UNC-Chapel Hill
Diana Lys – UNC Chapel Hill
Yi Pan – UNC Chapel Hill
A Framework for Improvement:
Analyzing Performance Assessment Scores for
Evidence-Based Teacher Preparation Program Reforms
Kevin C. Bastian – UNC Chapel Hill
Diana Lys – UNC Chapel Hill
Yi Pan – UNC Chapel Hill
A Framework for Improvement: Analyzing Performance Assessment Scores for
Evidence-Based Teacher Preparation Program Reforms
Kevin C. Bastian
University of North Carolina at Chapel Hill
Education Policy Initiative at Carolina (EPIC)
Diana Lys, UNC-Chapel Hill
University of North Carolina at Chapel Hill
College of Education
Yi Pan, UNC-Chapel Hill
University of North Carolina at Chapel Hill
Frank Porter Graham Institute
Contents
Acknowledgements .......................................................................................................................... i
Abstract ........................................................................................................................................... ii
Introduction ..................................................................................................................................... 1
Theoretical Framework ................................................................................................................... 3
Cultural Historical Activity Theory ............................................................................................ 3
Teacher Education Communities ................................................................................................ 5
Teacher Candidate Performance Assessment Data ......................................................................... 6
Empirical Framework: Latent Class Analysis ............................................................................... 9
Background ................................................................................................................................. 9
Demonstrating Latent Class Analysis ....................................................................................... 10
Using Latent Class Analysis for Program Improvement .......................................................... 12
Empirical Framework: Predictive Validity Analyses .................................................................. 13
Background ............................................................................................................................... 13
Demonstrating Predictive Validity Analyses ............................................................................ 14
Using Predictive Validity Analyses for Program Improvement ............................................... 18
Discussion ..................................................................................................................................... 20
Appendix ....................................................................................................................................... 24
i
Acknowledgements
We are grateful to the faculty and staff at our Partner University for providing their edTPA data
and being enthusiastic and receptive research partners. We wish to thank the University of North
Carolina General Administration and the UNC Council of Education Deans for their feedback
and financial support as part of the Teacher Quality Research Initiative.
ii
A Framework for Improvement: Analyzing Performance Assessment Scores for
Evidence-Based Teacher Preparation Program Reforms
Kevin C. Bastian, UNC-Chapel Hill
Diana Lys, UNC-Chapel Hill
Yi Pan, UNC-Chapel Hill
February 2017
Abstract
Teacher candidate performance assessments represent a promising source of data for
evidence-based program improvement. However, teacher preparation programs (TPPs)
interested in reform face a crucial question: how to identify actionable evidence in performance
assessment data. To address this concern we propose a two-pronged empirical framework that
TPPs can use to analyze performance assessment data. The first approach, latent class analysis,
groups candidates together based on similarities in their performance assessment scores—
creating profiles of instructional practice. This can help TPPs provide targeted supports to
candidates. The second approach, predictive validity analyses, estimates relationships between
candidates’ performance assessment scores and their performance as teachers-of-record. This
can help TPPs identify programmatic elements significantly related to teacher outcomes. We
illustrate this framework with edTPA data from a Partner University and contend that the impact
of performance assessments can be amplified by these common strategies for analyzing
performance assessment data.
1
Introduction
In recent years the policy context surrounding teacher education has increasingly
emphasized teacher preparation programs (TPPs) using data to engage in evidence-based
program reform. Most notably, the Council for the Accreditation of Educator Preparation
(CAEP), the national TPP accreditation body, explicitly requires programs to demonstrate the
impact of their graduates in K-12 schools and to use multiple forms of data for continuous
program improvement (CAEP, 2013). To fulfill this requirement, TPPs can employ a range of
outcome measures, including teaching placement rates, teacher value-added estimates, ratings of
teachers’ instructional practices, teacher retention, and surveys of program graduates and their
employers.
Beyond outcome measures focused on the performance and persistence of program
graduates, an emerging source of data for evidence-based reforms are teacher candidate
performance assessments. Candidate performance assessments (e.g. edTPA or PPAT) are
portfolios completed by teaching candidates during their student teaching experience that
typically include video clips of instruction, lesson plans, samples of student work and teacher
feedback, and candidates’ reflective commentaries. Teacher preparation programs can use these
data to assess candidates’ readiness to enter the teaching profession and as a source of evidence
for programmatic reforms. In this context of TPP improvement, candidate performance
assessments may be particularly valuable for a number of reasons. First, performance
assessments provide timely feedback to TPPs—rather than waiting more than a year for the
evaluation ratings or value-added estimates of program graduates, performance assessment data
are accessible to TPPs prior to candidate completion. Second, performance assessments provide
TPPs with feedback on candidate performance across a range of teaching practices so that TPPs
can identify candidates’ specific strengths and weaknesses. Third, elements of performance
assessments connect to specific programmatic components, meaning TPPs can improve
outcomes by pinpointing practices to update or change based on performance assessment scores
(Diez, 2010). Lastly, as outlined by Peck and colleagues, candidate performance assessments
supply program faculty and staff with a common language, common expectations, and a forum
for accepting collective responsibility for candidate performance (Peck, Singer-Gabella, Sloan,
& Lin, 2014).
While candidate performance assessments represent a rich and promising source of data for
improvement efforts, TPPs interested in reform face a crucial question: how to best analyze
performance assessment data to identify actionable evidence. Essentially, how can programs
turn their candidate performance assessment scores into a clear direction for improvement? With
over 700 TPPs in 39 states currently using candidate performance assessments (edTPA, 2016),
there is a pressing need to answer this question and to provide a common framework for
analysis. The impact of common teaching candidate performance assessments can be amplified
by common strategies for analyzing performance assessment data. Collectively, TPPs must
2
improve the assessment literacy among their teacher education faculty (Diez, 2010) and reform
TPP practice by building cultures that engage with evidence and explore transformative program
change (Engestrom, 2001; Peck & McDonald, 2013). Therefore, in this study, we propose a
two-pronged empirical framework that TPPs can use to analyze their candidate performance
assessment scores, detail the evidence generated by these methods, and describe how this
evidence can connect to program improvement efforts.
Building from Halpin and Kieffer (2015), the first empirical strategy in our framework is
latent class analysis (LCA), an approach which groups observations together based on
similarities in their item/variable scores. Teacher preparation programs can perform LCA to
group teaching candidates together based on their performance assessment scores and then use
this classification structure to (1) predict candidates’ assignment to classes with other sources of
program data (e.g., entry characteristics, coursework performance, exposure to programmatic
components); (2) inform targeted remediation before program completion; and (3) assist school
districts in providing targeted beginning teacher supports. Quite simply, TPPs can use LCA to
both make informed remediation/intervention decisions for their current candidates and to better
understand why candidates are in certain latent classes—so that targeted intervention can occur
more quickly for future cohorts of teaching candidates.
Given the current policy context connecting teacher education to the outcomes of program
graduates, the second empirical strategy in our framework is a set of predictive validity analyses
to estimate the relationships between candidates’ performance assessment scores and their
performance (e.g. value-added estimates or evaluation ratings) as teachers-of-record. These
analyses can tell TPPs which performance assessment measures significantly predict graduate
performance. With these findings TPPs and their faculty can connect performance assessment
measures to specific elements of the program and use this evidence to better prioritize their
improvement efforts—by focusing on the programmatic components significantly associated
with graduate performance. More broadly, results from these predictive validity analyses can
help TPPs determine whether candidate performance assessments are a valuable measure on
which to base program improvement decisions.
To illustrate this two-pronged empirical approach, we partnered with a large public
university in North Carolina (hereon referred to as Partner University) to perform LCA and
predictive validity analyses on the edTPA scores of their 2012-13 graduating cohort. Here, we
stress that the purpose of these analyses is to demonstrate a common framework that TPPs can
use to analyze and act on their own candidate performance assessment data. We do not aim to
draw specific conclusions about LCA groups or which edTPA tasks/rubrics significantly predict
teacher performance. We hope that by providing examples of the methods and results TPPs will
better appreciate how this analysis framework can be a valuable tool in their program
improvement efforts.
3
In the succeeding sections we first outline the theoretical framework motivating this work
and then briefly detail performance assessments and the performance assessment data used in our
empirical framework analyses. Next, we present our LCA and predictive validity analyses by
providing background on the methods, illustrating the approach with edTPA data from our
Partner University, and detailing how TPPs can use the evidence generated by these analyses for
program improvement. Finally, we close with a discussion of how candidate performance
assessments and this empirical framework can help TPPs overcome the challenges of evidence-
based reform.
Theoretical Framework
Cultural Historical Activity Theory
To drive change in teacher preparation, it is critical to acknowledge the complexity in
which teacher education faculty, their candidates, and their partners work. Improving TPP
outcomes through evidence-based reforms is not merely a matter of having teacher performance
data; it is about how those data intersect with other elements of the teacher preparation enterprise
to improve program implementation and effectiveness. To consider this process we adopt a
widely used framework in teacher education—Engestrom’s Cultural Historical Activity Theory
(CHAT) (Ellis, Edwards, & Smagorinsky, 2010; Engestrom, 2001; Peck & McDonald, 2014;
Sloan, 2013). Specifically, we use a version of Engestrom’s model adapted by Peck and
McDonald that better aligns with evidence-based reform in teacher education (Peck &
McDonald, 2014). As shown in Figure 1, the adapted CHAT model is an interconnected
framework in which elements interact and impact the overall outcome, goal, or activity. For
example, the Rules of the system impact how members of the Community collaborate; the
Instruments or tools used by the system impact how members support one another in the
Division of Labor. Essentially, CHAT provides a framework for “understanding how change
happens in an effort to promote purposeful change” (Sloan, 2013).
4
Figure 1: An Adapted CHAT Framework for Understanding TPP Change
In the CHAT framework the unit of analysis is the activity system, not the individual
actions or components of the system. Understanding the elements of the activity system and how
they interact is critical to understanding the framework’s complexity and its ability to capture
and describe change. For instance, using the CHAT framework to analyze distributed leadership
in TPPs, Sloan demonstrated how targeted leadership actions within the TPP influenced faculty
interactions and curriculum over time (Sloan, 2013). Recognizing that the pace of change in
higher education is often slow and incremental (Schein, 1990; Tagg, 2012), CHAT provides a
powerful lens for viewing the elements of change in action, particularly those hidden actions in
support of community efforts.
For the current study, we highlight three elements in the CHAT framework—Objects,
Instrument, and Community—and consider how the interplay among these elements may create
disturbances in the TPP activity system that lead to substantive improvements in the Outcome.
In the present study the Outcome is the drive to enact evidence-based reform in teacher
preparation that leads to higher quality program graduates. As shown in Figure 2, the Objects
being used to lead to this outcome are two data analysis models—LCA and predictive validity
studies. The Instrument is the candidate performance assessment and the Community is faculty
and administrative leadership in the TPP. The activity system is the TPP itself. Importantly,
Figures 1 and 2 show that performance assessments are only one element of the activity system
and that data from these assessments must interact with multiple elements—elements often at
odds with or critical of one another—to drive learning and change in TPPs.
5
Figure 2: CHAT Elements Driving Learning in Teacher Preparation Programs
As described by Engestrom (2001), change occurs through disturbances in the activity
system that push and drive the system forward. Alternatively, activity systems can also be
stagnant and resistant to change. At the Partner University there were several key disturbances
in the system that drove change forward. The first disturbance was the availability of common
performance assessment data for all licensure programs. The second disturbance was the ability
to draw together the teacher education community for regular programmatic meetings focused on
data. The third disturbance was the “will for change” within the TPP that emanated from
program leadership. Emerging from these disturbances, we assert that learning within the system
comes from collaborative analyses to derive meaningful evidence from candidate performance
assessment data. Quite simply, TPPs need valid and reliable teacher candidate data presented in
interpretable and informative ways for teacher educator communities to digest, to question, and
to use as a guide for program improvement. In the present study our unique contribution is
highlighting the analysis methods (latent class analyses and predictive validity analyses) that will
help facilitate this change process.
Teacher Education Communities
When focusing on the relationship between TPPs and their candidates, Diez posed a
deceptively simple question: “Have they learned what we taught them?” (Diez, 2010). In PK-12
settings, such a question might be tackled by a group of teachers in a professional learning
community (PLC), with research confirming that such communities drive teacher learning and
development (Grossman, Wineburg, & Woolworth, 2001; Ronfeldt, Farmer, McQueen, Grissom,
2015). In teacher education, there are many parallels. Teacher education faculty teach students
(pre-service teachers) and are responsible for ensuring that candidates possess the knowledge and
skills to succeed as beginning teachers. Teacher education leadership guide and sustain the drive
Instruments
Teacher Perfromance Assessment Data
Community
Teacher Education Faculty
Object
Data Analysis Models
6
for change while supporting faculty development. Together, these efforts create opportunities for
faculty to unite, learn, and innovate their practices. The complexity of this work also presents
challenges for teacher education communities.
One challenge faced by teacher education communities is how to use teacher performance
data to improve candidate learning. Research linking aspects of teacher preparation to the
performance of program graduates is growing rapidly (Boyd, Grossman, Lankford, Loeb, &
Wyckoff, 2009; Henry & Bastian, 2015; Henry, Campbell, Thompson, Patriarca, Luterbach, Lys,
& Covington, 2013; Preston, 2016; Ronfeldt, 2015; Ronfeldt & Reininger, 2012); however, there
has been little focus on how TPPs respond to and use such research evidence. While some
teacher educators document their data-use to drive evidence-based improvements (Cuthrell, Lys,
Fogarty, & Dobson, 2016), many TPPs suffer from either a lack of “actionable data” (Ledwell &
Oyler, 2016) or a lack of assessment literacy among teacher education faculty (Diez, 2010). To
help build this literacy, teacher education communities need common ground upon which to
build a community of practice and learning (Grossman, Wineburg, & Woolworth, 2001). Little
and colleagues (2003) observed four conditions that facilitated the work of a teacher community
engaged in analyzing student work: (1) tools tailored to the local context; (2) the ability to talk
across and within content areas; (3) a scaffolded and supported inquiry approach; and (4) norms
and leadership to drive discussions forward. Candidate performance assessments help meet
these needs by providing common standards and language and opportunities for faculty to
engage with data in discussions that could lead to empowered teacher educator communities.
A second challenge is how to structure opportunities for faculty to engage with
performance assessment data to improve programs and teacher candidate learning. In TPPs,
teacher educator communities often default to department or program-level meetings without
structured opportunities to build assessment literacy and engage with assessment data. We
assert that evidence-based change will not occur without these key ingredients and a will to
change. Multiple TPPs have demonstrated the value of “putting the data on the table” for faculty
examination in order to build a culture of inquiry within the learning community (Cuthrell, Lys,
Fogarty, & Dobson, 2016; Peck & McDonald, 2013; Sloan, 2013). In these TPPs, leadership
played a key role in developing faculty engagement opportunities by institutionalizing
performance data exploration and allowing faculty to lead reforms to curricula and clinical
practice. By restructuring TPP practices, faculty are able to engage with data and prioritize
which activities are most meaningful for their professional practice, program development, and
ultimately, teacher candidate learning.
Teacher Candidate Performance Assessment Data
Over the past decade many teacher educators across the United States have supported the
creation and widespread adoption of teacher candidate performance assessments. These
performance assessments stem from the National Research Council’s call to develop broader and
7
more authentic assessments of teacher candidates and their performance in the classroom and are
modeled after the National Board for Professional Teaching Standards and its performance-based
framework for assessing and credentialing veteran teachers (Darling-Hammond, 2010; Mitchell,
Robinson, Plake, & Knowles, 2001). Through the completion of a student teaching portfolio that
includes lesson plans, video clips of instruction, samples of student work, and commentaries on
teaching decisions, candidate performance assessments are designed to capture a broad range of
knowledge and skills and assess whether candidates are ready to enter the teaching profession
(Pecheone & Chung, 2006). Currently, edTPA, developed by the Stanford Center for
Assessment, Learning, and Equity (SCALE), is the most widely-adopted teacher candidate
performance assessment, with over 700 TPPs in 39 states in varied stages of implementation
(edTPA, 2016).
Our Partner University began piloting the TPA (the predecessor to the current edTPA) in
selected middle and secondary grades programs in the 2010-11 academic year and expanded the
performance assessment into their elementary, music and special education programs in the
2011-12 academic year. The edTPA data used to illustrate our two-pronged empirical
framework come from the Partner University’s 2012-13 graduating cohort. In total, we have 369
edTPA scores from 13 different edTPA handbook areas (e.g. elementary literacy, secondary
history and social studies). Rather than submitting edTPA portfolios to be officially scored,
faculty and staff at the Partner University locally-evaluated the performance assessment
portfolios after participating in training facilitated by officially-calibrated faculty and following
the local evaluation protocols provided by SCALE. Our Partner University blinded scoring
assignments within content areas and did not assign university supervisors or faculty to score the
portfolios of the candidates they supervised during student teaching. While our illustrative
example uses locally-evaluated performance assessments, we stress that our proposed empirical
framework is equally valid and useful when analyzing officially-scored performance
assessments. Programs can analyze either source of performance assessment data to generate
evidence for reforms; programs with both locally and officially-scored portfolios can analyze
both sets of data to compare results and conclusions.
Table 1 displays summary statistics for the 2012-13 edTPA scores from our Partner
University. Overall, edTPA is comprised of three main tasks—Planning, Instruction, and
Assessment—with five scored rubrics within each task. Evaluators score each rubric from 1 to
5, with a 1 indicating a struggling candidate who is not ready to teach, 2 indicating a candidate
who needs more practice, 3 indicating an acceptable level of performance to begin teaching, 4
indicating a candidate with a solid foundation of knowledge and skills, and 5 indicating a highly
accomplished teacher candidate. On average, teaching candidates at the Partner University
scored between and 3.00 and 3.50 across rubrics—the average rubric score was 3.34 and the
average total score was 50.06—with higher scores in the planning and instruction tasks.
8
Table 1: edTPA Summary Statistics
edTPA
Task edTPA Rubric
Mean Score
and Standard
Deviation
Percentage
Scoring at
Level 1
Percentage
Scoring at
Level 2
Percentage
Scoring at
Level 3
Percentage
Scoring at
Level 4
Percentage
Scoring at
Level 5 P
lan
nin
g
Planning for Content Understanding 3.54
(0.83) 1.63 5.96 39.84 41.46 11.11
Planning to Support Varied Student
Learning Needs 3.43
(0.85) 2.98 6.23 44.17 38.21 8.40
Using Knowledge of Students to
Inform Teaching 3.33
(0.84) 1.63 9.76 52.30 26.83 9.49
Identifying and Supporting
Language Demands
3.31
(0.82) 2.71 8.13 51.22 31.44 6.50
Planning Assessment to Monitor and
Support Student Learning 3.44
(0.84) 2.44 5.69 46.61 35.77 9.49
Inst
ru
cti
on
Learning Environment 3.49
(0.76) 1.08 1.63 56.10 29.54 11.65
Engaging Students in Learning 3.34
(0.79) 2.98 4.34 54.20 32.25 6.23
Deepening Student Learning 3.34
(0.78) 1.63 6.50 55.83 28.18 7.86
Subject-Specific Pedagogy 3.47
(0.82) 3.52 3.52 42.55 43.36 7.05
Analyzing Teacher Effectiveness 3.25
(0.84) 2.17 11.92 52.03 26.56 7.32
Ass
ess
men
t
Analysis of Student Learning 3.33
(0.92) 5.15 6.78 46.88 32.25 8.94
Providing Feedback to Guide
Further Learning
3.38
(0.85) 4.07 7.59 38.75 45.26 4.34
Student Use of Feedback 3.06
(0.91) 5.42 16.80 49.32 23.04 5.42
Analyzing Students’ Language Use 3.09
(0.86) 4.61 13.01 56.64 20.05 5.69
Using Assessment to Inform Instruction 3.25
(0.95) 6.50 7.59 49.32 27.64 8.94
Average Rubric Score 3.34
(0.66)
Average Total Score 50.06
(9.84) Note: This table displays summary statistics—means, standard deviations, scoring distributions—for the Partner University’s 2012-13 edTPA scores.
9
Empirical Framework: Latent Class Analysis
Background
Unlike factor analysis, which identifies latent constructs comprised of items/variables,
LCA is an observation-centered approach that groups observations together based on similarities
in their item/variable scores. Specifically, LCA assigns observations to a category with other
like-scoring observations and provides an estimate of measurement error—how well did the
observation fit into the assigned category versus other categories. Considering TPPs, LCA
provides an empirical and objective basis for grouping teaching candidates together according to
their performance assessment scores. These groupings can be considered profiles of
instructional practice, with each latent class identifying a set of teaching candidates with similar
instructional strengths and shortcomings (Grossman, Loeb, Cohen, & Wyckoff, 2013). For
TPPs, the benefit of LCA is straightforward: it is a way to summarize performance assessment
data to make inferences about individual teaching candidates and diagnose strengths and
deficiencies—in candidates’ knowledge and/or skills—for further feedback and support.
Essentially, LCA offers a rigorous, quantitative approach that can help TPPs design and
implement targeted interventions to their teaching candidates (Halpin & Kieffer, 2015).
Overall, LCA entails three key advantages for TPPs. First, TPPs need very little to perform
LCA—candidates’ performance assessment scores and access to software platforms that support
the analyses (e.g. Stata, Mplus). Compared to predictive validity analyses, which require TPPs
to acquire/access data on graduates’ outcomes as teachers-of-record, this is an important
advantage for LCA. Second, LCA has an advantage of time and timing. Arguably, teacher
education faculty may conduct a similar (yet subjective) sorting exercise following a thorough
review of portfolios and scores, however, the time and labor required to conduct such a review is
considerable. LCA may be conducted with fewer time and personnel resources. Additionally,
rather than waiting for teacher outcome data (e.g. graduates’ value-added estimates), TPPs can
perform LCA as soon as performance assessment scores are available. This timing is crucial for
providing candidates with rapid feedback and support prior to program completion. Lastly, TPPs
can perform LCA on the performance assessment scores of all candidates or for smaller groups
within the preparation program—for certain licensure areas (e.g. elementary, secondary grades
mathematics) or for certain pathways (e.g. traditional undergraduate, graduate degree). This
flexibility may allow TPPs to identify distinct profiles of instructional practice within the broader
program and to better tailor interventions for candidate improvement.
Below, we describe results from LCA on our Partner University’s 2012-13 edTPA scores.
While the number and characteristics of latent classes may vary for other TPPs, our analysis
steps and the types of output they produce will be similar across TPPs undertaking LCA. Thus,
we provide an illustrative example of how TPPs can analyze their performance assessment data
to consider targeted intervention with candidates.
10
Demonstrating Latent Class Analysis
Our first empirical strategy extends the central thesis of Halpin and Kieffer (2015)—that
LCA of teachers’ observation scores can support targeted professional development
interventions—to teaching candidates and their performance assessment scores. Here, an initial
decision for TPPs to make is whether to perform LCA on all of the performance assessment
rubrics, in the same model, or to perform LCA on the main performance assessment domains
(e.g. planning, instruction, and assessment), separately. For this illustrative case, we estimate a
single LCA including all 15 edTPA rubrics; TPPs focused on certain domains of a candidate
performance assessment may choose to perform domain-specific LCA.
The first step in LCA is identifying how many latent classes exist in the data. This process
begins by estimating models with different numbers of latent classes. Next, the results of these
models are compared and the final number of latent classes is determined by goodness of fit
statistics (Lubke & Neale, 2006). Essentially, this process identifies which scenario—which
number of latent classes—best fits the data. For our Partner University’s edTPA scores, multiple
goodness of fit criteria—AIC, BIC, -2 log likelihood—identify four latent classes (four profiles
of candidate instructional practice). This result is consistent with work showing that an
instrument with three factors—see Appendix Table 1 for factor analysis results of Partner
University’s edTPA data—typically equates to four latent classes (Halpin, Dolan, Grasman, &
De Boeck, 2011).
After determining the number of latent classes, the next analysis step is assessing how well
observations fit in their assigned class versus another class. For TPPs, this is key to the
interpretation and use of LCA results: if teaching candidates do not fit well into their assigned
class then targeted interventions, to address deficiencies in candidates’ practice, may not be
appropriate for those candidates. Latent class analysis estimates the probability of observations
being in each class and from this, the average profile membership score—observations’
probability of being in their assigned latent class—can be calculated. The average profile
membership score for our Partner University’s teaching candidates was 0.963, indicating that
nearly all teaching candidates had a high probability of being assigned to only one latent class.
Overall, 52 percent of the teaching candidates had a profile membership score of one; 88 percent
of the teaching candidates had a profile membership score of 0.90 or higher.
11
Figure 3: Average edTPA Rubric Scores for each Profile of Instructional Practice
Note: This figure displays the average edTPA rubric scores for Partner University candidates in Profiles A, B, C,
and D.
To push towards actionable-evidence from LCA, a final step is characterizing and assessing
differences in the identified classes. Figure 3 depicts the average edTPA rubric scores for the
four profiles of instructional practice at Partner University. Here, the four latent classes can be
characterized as follows: (1) Profile A, a high scoring group comprised of 54 candidates whose
average rubric score is 4.38 and whose average total score is 65.63; (2) Profile B, a middle-high
scoring group comprised of 143 candidates whose average rubric score is 3.56 and whose
average total score is 53.45; (3) Profile C, a middle-low scoring group comprised of 128
candidates whose average rubric score is 3.05 and whose average total score is 45.70; and (4)
Profile D, a low scoring group comprised of 44 candidates whose average rubric score is 2.18
and whose average total score is 32.64. To provide Partner University with further evidence on
these profiles, we tested whether there were statistically significant differences in the edTPA
rubric scores for each adjacent pair of groups. For all 15 edTPA rubrics, there were significant
scoring differences between Profile A vs. Profile B, Profile B vs. Profile C, and Profile C vs.
Profile D. Within profiles edTPA scores are relatively stable. Rather than scoring high on some
elements of edTPA and low on others, teaching candidates in each profile perform fairly
consistently across rubrics (except for Profile D’s drop in scores for the edTPA Assessment
task). While this suggests that TPPs may be able to classify candidates without formal analysis
methods, we contend that (1) LCA is a more empirical and objective classification approach and
1
1.5
2
2.5
3
3.5
4
4.5
5
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15
edT
PA
Ru
bri
c S
core
s
Profile A Profile B Profile C Profile D
12
(2) profiles of instructional practice from other TPPs may not resemble (may be non-parallel)
those of our Partner University. Below, we describe how TPPs can leverage LCA results to
improve candidate and program outcomes.
Using Latent Class Analysis for Program Improvement
An advantage of LCA is the opportunity to use it at multiple intervention points for
candidate and program improvement. At present, we have identified three opportunities for
TPPs to use LCA results; new usages may emerge as more TPPs adopt the analysis approach.
The first opportunity involves the development of profiles of instructional practice by joining
LCA results with other TPP data (e.g. selection criteria, course-taking, dispositional and
observation ratings). This can help TPPs intervene with successive cohorts of teaching
candidates at earlier stages of the program. The next opportunity is for teacher candidate
remediation and reinforcement prior to program completion. The final opportunity follows
program graduates into the field, using LCA results to support successful teacher induction.
The primary use of LCA is the opportunity to develop profiles of instructional practice
from which TPPs can design and implement targeted supports for teacher candidates (Halpin &
Kieffer, 2015). Specifically, LCA results identify groups of candidates who excel and struggle
with particular teaching tasks; by combining these LCA results with other demographic and
program data, TPPs can create more robust profiles and predict the types/characteristics of
candidates in each profile (Ledwell & Oyler, 2016). With these data “on the table”, faculty
leadership can target teaching candidates that are comparable to those in established profiles for
intervention at earlier stages of the program. For example, future cohorts of teaching candidates
who fit the Group D profile (see Figure 3) could be targeted for early intervention, particularly
for program elements related to the edTPA Assessment domain. Conversely, future cohorts of
teacher candidates who fit the Group A profile might be engaged in additional teacher leadership
development opportunities. Essentially, such profiles of instructional practice offer faculty the
opportunity to target their efforts, to focus on the specific needs of candidates, and to ground
their actions in evidence (Sloan, 2013).
The next opportunity for LCA use is remediation and reinforcement of current teacher
candidates prior to entering the classroom. As noted previously, a potential advantage of LCA is
timing. Although the timing is tight, TPPs can conduct LCA on candidate performance
assessment scores prior to the end of the semester and provide focused remediation and support
before graduation or as part of a “summerbridge” activity prior to entering the classroom.1 For
example, Partner University’s candidates in Profile D, who struggle most with the Assessment
task, would benefit from additional opportunities to analyze assessment data and use those data
to inform instruction. Using LCA in this way allows TPPs to model providing data-driven
1 For these remediation and reinforcement purposes, a potential benefit to local-scoring, in comparison to official
scoring, is written, formative feedback on candidate performance that can aid remediation efforts (Ledwell & Oyler,
2016).
13
feedback to candidates and allows candidates the opportunity to use feedback to guide further
learning.
A final opportunity is the use of LCA for collaboratively developed induction plans that
follow candidates into their first teaching position to support their professional growth.
Nationally, TPPs are encouraged and expected to work with partner school districts to support
beginning teachers in the initial stages of their careers (CAEP, 2013). The profiles of
instructional practice yielded by LCA can help TPP faculty and hiring districts/schools to
collaboratively customize induction and professional growth plans based on individual teachers’
needs—rather than a one size fits all program. In this way, LCA may present new opportunities
for TPPs to strengthen partnerships with school districts, with a common focus on beginning
teacher support, retention, and success. Such activities may prove valuable for TPP-district
partnerships, teacher development, and TPP accreditation.
Empirical Framework: Predictive Validity Analyses
Background
While LCA can be a valuable tool that informs TPPs’ intervention with candidates and
facilitates research to better understand candidates’ performance assessment scores, it is silent
regarding the relationships between candidate performance assessments and outcomes for
teachers-of-record. Therefore, the second approach in our empirical framework is a set of
predictive validity analyses to estimate the associations between performance assessment
measures and indicators of beginning teacher performance. This approach is in line with the
current outcomes-driven policy environment and can help TPPs prioritize improvement efforts
focused on performance assessment measures (teaching competencies) that are significantly
associated with teacher effectiveness. More broadly, results from these predictive validity
analyses can help TPPs determine whether performance assessments are a valuable measure on
which to base program improvement decisions.
To perform predictive validity analyses, TPPs need access to individual-level indicators of
beginning/early-career teacher performance. Examples of such indicators include teacher value-
added estimates, ratings of teachers’ instructional practices, or student surveys. When possible,
it is beneficial for TPPs to assess the predictive validity of candidate performance assessment
scores with multiple teacher outcomes, as this provides a more comprehensive accounting of
teacher performance, allows for a larger analysis sample,2 and lets TPPs examine relationships
with specific teaching competencies. Unlike LCA, which can be performed by any TPP, we
acknowledge that predictive validity analyses entail more challenges for TPPs. Primarily, TPPs
need access to indicators of teacher performance, and in certain states these may not exist or may
not be available to TPPs. Even when accessible, the small size of many TPPs may make it
2 In North Carolina, for example, approximately 35-40 percent of teachers have a value-added estimate while nearly
90 percent of teachers have evaluation ratings from their school principal.
14
difficult to estimate relationships between performance assessment scores and teacher
effectiveness. Addressing these challenges may require TPPs to form partnerships—with state
(local) education agencies or outside researchers to access data or perform predictive validity
analyses and with other TPPs to pool data and increase sample size. These teacher performance
data may soon be more accessible to TPPs as evidence-based program accountability and
improvement becomes a higher policy and program priority.
Below, we describe the measures and methods we use to demonstrate our predictive
validity analyses. While specific analysis details will vary for other TPPs, depending upon the
data available to them, this provides a framework for an approach that TPPs can take to
determine what performance assessment measures predict graduate performance.
Demonstrating Predictive Validity Analyses
To examine the predictive validity of our Partner University’s 2012-13 edTPA scores, we
use two teacher outcomes: value-added estimates and evaluation ratings. For value-added, we
use teachers’ Education Value-Added Assessment System (EVAAS) scores, the official measure
of value-added used for teacher evaluation in North Carolina public schools (NCPS) (Wright,
White, Sanders, & Rivers, 2010). We focus on teacher EVAAS estimates in elementary and
middle grades mathematics, reading, and science; middle grades social studies; and across a
range of high school math, science, English, history, and civics courses. To ease interpretability
of the EVAAS estimates, which are expressed in either normal curve equivalency units or scale
score points, we standardized the EVAAS scores, within test (e.g. 4th grade mathematics, 7th
grade reading, U.S. history), across all NCPS teachers with EVAAS estimates. This allows us to
interpret coefficients as the relationship between a particular edTPA measure and a percentage of
a standard deviation in teacher effectiveness. Since the outcome variable in these analyses is a
normally distributed measure of teacher value-added, we estimate ordinary least squares (OLS)
regression models and control for a limited set of covariates to better isolate the associations
between edTPA measures and teacher value-added.3 Overall, the sample for these analyses
includes 209 value-added estimates for 152 first-year teachers in the 2013-14 school year.
For evaluation ratings, we use teachers’ ratings on the North Carolina Educator Evaluation
System (NCEES), a statewide evaluation rubric in which school administrators rate teachers
across five standards: teachers demonstrate leadership (Standard 1); teachers establish a
respectful environment for a diverse group of students (Standard 2); teachers know the content
they teach (Standard 3); teachers facilitate learning for their students (Standard 4); and teachers
reflect on their practice (Standard 5). To evaluate teachers, school administrators use formal
classroom observations and paper-based evidences to document key indicators of practice and
rate teachers as either not demonstrated (level 1), developing (level 2), proficient (level 3),
3 In both our value-added and evaluation rating analyses we control for the school-level percentage of students
qualifying for subsidized meals and the school-level percentage of racial/ethnic minority students. In our value-
added models we also include a set of test-area indicators to net out value-added differences across tests.
15
accomplished (level 4), or distinguished (level 5) on each of the five NCEES standards. Since
the outcome variable in these analyses is an ordinal (1-5) evaluation rating, we estimate ordered
logistic regression models and control for a limited set of covariates to better isolate the
relationships between edTPA measures and teacher evaluation ratings. Overall, the sample for
these analyses includes 235 first-year teachers who were evaluated by a school administrator in
the 2013-14 school year.
To increase the utility of these predictive validity analyses, we estimate three types of
models. Our first approach estimates the relationships between teacher outcomes and constructs
derived from factor analysis of the edTPA scores (please see Appendix Table 1 for the factor
loadings).4 These results let TPPs know whether constructs of broad teaching tasks impact
teacher performance. Our second approach estimates relationships between teacher outcomes
and each edTPA rubric (entered individually into models). These results may be particularly
valuable, as they help TPPs connect specific edTPA rubrics and the knowledge and skills
underlying those rubrics to components of the program. Finally, our third approach estimates
relationships between teacher outcomes and a standardized total score across all 15 edTPA
rubrics. These results provide a holistic assessment of whether higher edTPA scores predict
teacher performance.5
Tables 2 and 3 display predictive validity results from our value-added and evaluation
rating models. As previously stated, we do not intend to make generalizable conclusions about
the predictive validity of edTPA scores based on these results; rather, we hope to discuss the
findings in such a way that TPPs can gain insight into interpreting results for their own program
improvement purposes. Overall, Table 2 shows that the Instruction and Assessment factors are
significantly associated with teacher value-added estimates—a one standard deviation increase in
these constructs led to a 21 and 18 percent of a standard deviation increase in teacher value-
added. To put these results into perspective, the average difference between first-year and
second-year teachers in NCPS is 23 percent of a standard deviation. At the edTPA rubric level,
four instruction rubrics (rubrics 6-9) and four assessment rubrics (rubrics 12-15) significantly
predict higher value-added estimates. For example, a one-point increase in a candidate’s score
on the “Engaging Students in Learning” rubric is associated with a 27 percent of a standard
deviation increase in teacher value-added. Lastly, the standardized total score significantly
predicts higher value-added—a one standard deviation increase in the total score, equivalent to
approximately 10 edTPA points in our sample, is associated with an 18 percent of a standard
deviation increase in teacher value-added.
4 These factor analysis results closely resemble the three main edTPA tasks of Planning, Instruction, and Assessment
and fully reproduce the factor analysis results from the 2013 edTPA field test report (SCALE, 2013). 5 Teacher preparation programs can also estimate predictive validity models to determine whether outcomes differ
for graduates in different latent classes (Halpin & Kieffer, 2015). For example, what are the performance
differences for graduates in Profile A versus Profile B?
16
Table 2: Predictive Validity with Teacher Value-Added Estimates
edTPA Measures Standard EVAAS
Estimate
Planning factor 0.063
(0.067)
Instruction factor 0.213**
(0.060)
Assessment factor 0.178*
(0.068)
Planning for Content Understanding 0.121
(0.083)
Planning to Support Varied Student Learning Needs 0.050
(0.095)
Using Knowledge of Students to Inform Teaching 0.029
(0.098)
Identifying and Supporting Language Demands 0.092
(0.076)
Planning Assessment to Monitor and
Support Student Learning 0.102
(0.082)
Learning Environment 0.156+
(0.090)
Engaging Students in Learning 0.271**
(0.081)
Deepening Student Learning 0.210**
(0.079)
Subject-Specific Pedagogy 0.229*
(0.088)
Analyzing Teacher Effectiveness 0.134
(0.086)
Analysis of Student Learning 0.124
(0.079)
Providing Feedback to Guide Further Learning 0.194**
(0.074)
Student Use of Feedback 0.195*
(0.087)
Analyzing Students’ Language Use 0.215*
(0.082)
Using Assessment to Inform Instruction 0.154+
(0.080)
Standardized Total Score 0.184**
(0.068)
Cases 209
Note: This table displays coefficients for the relationship between edTPA data and teacher value-added estimates.
Standard errors are in parentheses. +, *, and ** indicate statistical significance at the 0.10, 0.05, and 0.01 levels.
17
Table 3: Predictive Validity with Teacher Evaluation Ratings
edTPA Measures Leadership Classroom
Environment
Content
Knowledge
Facilitating
Student
Learning
Reflecting
on
Practice
Planning factor 1.031
(0.876)
0.968
(0.870)
0.994
(0.975)
0.856
(0.341)
1.072
(0.681)
Instruction factor 1.161
(0.372)
1.213
(0.241)
1.093
(0.618)
0.987
(0.928)
1.195
(0.244)
Assessment factor 1.070
(0.742)
1.062
(0.745)
1.021
(0.901)
0.983
(0.912)
1.034
(0.838)
Planning for Content
Understanding
0.968
(0.887)
0.906
(0.610)
0.938
(0.767)
0.870
(0.485)
1.113
(0.589)
Planning to Support Varied
Student Learning Needs
1.239
(0.360)
1.079
(0.749)
1.017
(0.935)
0.889
(0.522)
1.149
(0.499)
Using Knowledge of
Students to Inform Teaching
0.945
(0.809)
1.039
(0.870)
0.975
(0.908)
0.780
(0.211)
0.981
(0.936)
Identifying and Supporting
Language Demands
0.986
(0.955)
0.937
(0.787)
0.920
(0.718)
0.780
(0.185)
0.918
(0.700)
Planning Assessment to
Monitor and
Support Student Learning
1.123
(0.622)
1.058
(0.819)
1.159
(0.545)
0.977
(0.913)
1.317
(0.199)
Learning Environment 0.998
(0.993)
1.214
(0.419)
1.263
(0.306)
0.926
(0.693)
1.161
(0.496)
Engaging Students in
Learning
1.184
(0.452)
1.187
(0.482)
1.058
(0.815)
0.922
(0.709)
1.334
(0.180)
Deepening Student Learning 1.364
(0.175)
1.378
(0.102)
1.094
(0.683)
0.987
(0.940)
1.148
(0.492)
Subject-Specific Pedagogy 1.159
(0.487)
1.229
(0.323)
1.073
(0.757)
1.022
(0.914)
1.176
(0.404)
Analyzing Teacher
Effectiveness
1.028
(0.898)
0.973
(0.896)
1.001
(0.996)
1.096
(0.625)
1.019
(0.921)
Analysis of Student Learning 1.059
(0.816)
1.010
(0.963)
0.956
(0.834)
0.951
(0.793)
0.988
(0.958)
Providing Feedback to Guide
Further Learning
1.193
(0.431)
1.211
(0.354)
1.152
(0.465)
1.197
(0.353)
1.055
(0.793)
Student Use of Feedback 1.093
(0.663)
1.128
(0.526)
1.039
(0.819)
0.976
(0.870)
1.184
(0.352)
Analyzing Students’
Language Use
1.040
(0.861)
1.019
(0.922)
1.143
(0.480)
0.955
(0.793)
1.079
(0.680)
Using Assessment to Inform
Instruction
1.021
(0.929)
1.019
(0.922)
0.948
(0.774)
0.888
(0.496)
0.992
(0.967)
Standardized Total Score 1.103
(0.656)
1.098
(0.667)
1.049
(0.816)
0.932
(0.682)
1.114
(0.544)
Cases 235 235 235 235 235
Note: This table displays odds ratios for the relationship between edTPA data and teacher evaluation ratings.
P-values are in parentheses. +, *, and ** indicate statistical significance at the 0.10, 0.05, and 0.01 levels.
18
Ideally, these edTPA measures would significantly predict multiple measures of teacher
performance—as this consistency provides clearer direction for program improvement efforts—
however, Table 3 indicates that there are no significant relationships between edTPA scores and
teacher evaluation ratings for Partner University’s 2012-13 graduates. Given the robust value-
added findings, there are two reasonable explanations for these null results: (1) the lack of
variation in evaluation ratings—approximately 75 percent of the sample earned ratings of
proficient (level 3) and/or (2) the use of local edTPA scores, which may be less reliable than
officially scored edTPA portfolios.6 In predictive validity analyses, other measures of teacher
performance—student surveys or rubric-based observation protocols (e.g. CLASS, Framework
for Teaching)—may better complement teacher value-added estimates. Below, we discuss how
TPPs can use the evidence generated by predictive validity analyses for program improvement.
Using Predictive Validity Analyses for Program Improvement
As previously noted, access to the data needed to complete predictive validity analyses is a
significant limitation for many TPPs and their state partners. However, as data access
collaborations increase across P-20, the field should anticipate increased use of such analyses in
the future. Therefore, the second prong of our empirical framework is more than an approach for
analyzing performance assessment data to aid program improvement; it also shows TPPs the
need to advocate and collaborate for increased data access and use.
The primary way that TPPs can use predictive validity analyses for improvement is to link
predictive validity results back to programmatic elements. Here, predictive validity analyses
using the total score or domain level data may be beneficial, however, rubric level analyses
provide TPP faculty with the most nuanced data to link to programmatic features. Specifically,
rubric level analyses may help TPPs link edTPA rubrics that significantly predict graduate
performance to course objectives and objectives developed across course sequences. By
identifying these linkages—essentially building maps between pre-service curricula and
components, candidate performance assessments, and in-service teacher outcomes—TPPs can
identify programmatic elements to emphasize and those to potentially reform. For example, if
higher scores on the Engaging Students in Learning rubric predict significantly higher value-
added estimates, TPPs can (1) strengthen the course objectives and programmatic components
tied to this rubric and (2) identify which candidates scored high (low) on this rubric. By
identifying these candidates, TPPs can better determine why candidates score well or poorly
(using other program data) and implement evidence-based reforms to improve practice.
Predictive validity analyses not only help faculty to target certain domains and rubrics
found to be significantly linked to teacher performance, but also, to target changes in their own
6 In prior analyses examining the predictive validity of locally-evaluated TPA data (from Partner University’s 2011-
12 graduating cohort), we found that performance assessment measures significantly predicted higher evaluation
ratings (Bastian, Henry, Pan, & Lys, 2016). Likewise, in work with officially-scored edTPA data (from Partner
University’s 2013-14 graduating cohort), we found that performance assessment measures significantly predicted
higher evaluation ratings (Bastian & Lys, 2016).
19
teaching practice. Essentially, predictive validity analyses can create disturbances in the system
that push the teacher education community and individual/groups of faculty members forward in
program improvement efforts (Engestrom, 2001). Since most TPPs lack the time and resources
to respond to predictive validity analyses with widespread reforms, faculty must consider the
analyses (and other program data) in conjunction with their ability to enact meaningful change
within their program and the courses they teach. For example, at Partner University, a pair of
faculty were particularly concerned when predictive validity value-added results confirmed their
concerns over a trend of low scores on rubrics 12 and 13 in the edTPA Assessment domain. The
faculty engaged in deep reflection on the edTPA scores, value-added results, and their own
teaching practices, leading to learning and instructional changes within their community.
Specifically, the faculty developed a new instructional model that included structured
opportunities for candidates to receive feedback from faculty, engage in peer feedback
exchanges, and then act upon the collective feedback. This feedback modeling activity increased
the focus on instructional practice and analysis of teaching, enhanced collegiality among the
teacher candidates, and ultimately, led to higher scores on edTPA rubrics 12 and 13.
Beyond assessing which programmatic elements and teaching practices predict graduate
performance, TPPs can use predictive validity analyses to help evaluate the success of TPP
reforms. For example, prior to undertaking predictive validity analyses, Partner University
adopted a focus on instructional strategy development as part of its Teacher Quality Partnership
grant. This led Partner University to create course-embedded instructional strategy modules that
required candidates to demonstrate their declarative, procedural, and conditional knowledge of
each strategy (Carson, Cuthrell, Smith, & Stapleton, 2010). As a summative program
assessment, edTPA scores provided valuable data about the impact of this reform on candidates;
predictive validity results provided evidence about the relationships between graduates’
instruction and student learning. Specifically, significant findings in the predictive validity
analyses between teacher value-added and both the edTPA Instruction task and edTPA rubrics 6-
9 suggested that the TPP focus on instructional strategies was having a positive impact on
teacher outcomes.
Finally, results from predictive validity analyses can help TPPs determine whether
candidate performance assessments are a valuable measure on which to base program
improvement decisions. If performance assessment measures do not predict beginning teacher
outcomes—e.g. value-added, evaluation ratings, student surveys—then the evidence provided by
performance assessments may not guide TPPs to adopt more effective preparation practices.
Therefore, a lack of predictive validity between performance assessment scores and teacher
outcomes should encourage TPPs to assess the conceptual alignment between performance
assessment measures (tasks, rubrics) and teacher outcomes, examine the reliability of
performance assessment scoring (if local evaluation is used), and consider additional
measures/instruments that may provide better evidence for program improvement.
20
Discussion
Teacher education currently resides within a broader policy context focused on data and
evidence-based reform (Haskins & Margolis, 2015; National Research Council, 2002). The goal
of this initiative is straightforward: to improve processes and outcomes through data-driven
decision making. Central to this initiative are three key assumptions: (1) the presence of timely,
valid, and reliable data; (2) an ability to identify actionable evidence within this body of data;
and (3) the capacity and will to act on evidence for improvement (Peck & McDonald, 2014).
From the perspective of teacher education, candidate performance assessments help address
the first assumption. Performance assessment scores are readily available to TPPs and research
shows that performance assessment scores are reliable (SCALE, 2013) and predictive of
beginning teacher outcomes (Bastian, Henry, Pan, Lys, 2016; Bastian & Lys, 2016; Goldhaber,
Cowan, & Theobald, 2016). Building from these results, the main contribution of the present
study addresses the second assumption: proposing a two-pronged empirical framework that
TPPs can use to derive actionable evidence from their performance assessment data. Here, we
recognize the potential advantages of collective action—the impact of common candidate
performance assessments can be amplified by common strategies for analyzing performance
assessment data. As more TPPs adopt this framework, communities of programs and teacher
educators can develop to collaboratively improve practice and push for greater access to teacher
outcomes data. Although there are limitations—tight time windows to receive performance
assessment scores and perform LCA, access to teacher outcomes data—we believe that this
empirical framework has the potential to cause disturbances in the TPP activity system and
promote learning and change in teacher education communities (Engestrom, 2001). Furthermore,
we note the widespread applicability of this framework—while we promote its use with
candidate performance assessments, TPPs can also use it with other performance instruments
(e.g. dispositional ratings, observation scores).
Latent class analysis of performance assessment scores can help TPPs organize, prioritize,
and target interventions to benefit their current teaching candidates and future cohorts of
candidates. Getting the most out of LCA will push TPPs to improve their data management
systems, develop new learning modules for candidate remediation and practice, and strengthen
relationships with surrounding schools/districts to better connect teacher education to beginning
teacher support. Predictive validity analyses can provide a compass for the LCA results, helping
TPPs identify which programmatic elements predict beginning teacher performance and
prioritize reforms that are connected to teacher outcomes. To best utilize predictive validity
results, TPPs must build maps linking pre-service curricula and components to performance
assessments and in-service teacher outcomes. Overall, these empirical frameworks offer TPPs
strategies to turn performance assessment data into evidence and may help TPPs develop the
collective capacity and will to turn evidence into improvement.
21
References
Bastian, K.C., Henry, G.T., Yi, P., & Lys, D. (2016). Teacher candidate performance
assessments: Local scoring and implications for teacher preparation program improvement.
Teaching and Teacher Education, 59, 1-12.
Boyd, D. J., Grossman, P. L., Lankford, H., Loeb, S., & Wyckoff, J. (2009). Teacher preparation
and student achievement. Educational Evaluation and Policy Analysis, 31(4), 416-440.
Carson, J., Cuthrell, K., Smith, J., & Stapleton, J. (2010, February). Helping preservice teachers
choose research-based instructional strategies. Paper session presented at AACTE Annual
Meeting and Exhibits, San Diego, CA.
Council for the Accreditation of Educator Preparation. (2013). CAEP Accreditation Standards.
Available from: http://caepnet.files.wordpress.com/2013/09/final_board_approved1.pdf
Cuthrell, K.C., Lys, D.B., Fogarty, E.A., & Dobson, E.E. (2016). Using edTPA data to improve
programs. Evaluating Teacher Education Programs Through Performance-Based
Assessments, 67.
Diez, M. E. (2010). It is complicated: Unpacking the flow of teacher education’s impact on
student learning. Journal of Teacher Education, 61(5), 441-450.
edTPA. (2015). Educative assessment and meaningful support: 2014 edTPA administrative
report. Available from: https://secure.aacte.org/apps/rl/resource.php?resid=558&ref=edtpa
Ellis, V., Edwards, A., & Smagorinsky, P. (Eds.). (2010). Cultural-historical perspectives on
teacher education and development: Learning teaching. Routledge.
Engeström, Y. (2001). Expansive learning at work: Toward an activity theoretical
reconceptualization. Journal of Education and Work, 14(1), 133-156.
Goldhaber, D., Cowan, J., & Theobald, R. (2016). Evaluating prospective teachers: Testing the
predictive validity of the edTPA. Center for Analysis of Longitudinal Data in Education
Research Working Paper, 157.
Grossman, P., Wineburg, S.S., & Woolworth, S. (2001). Toward a theory of teacher community.
Teachers College Record, 103(6), 942-1012.
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The
relationship between measures of instructional practice in middle school English language
arts and teachers’ value-added scores. American Journal of Education, 119, 445-470.
Halpin, P., Dolan, C., Grasman, R., & De Boeck, P. (2011). On the relation between the linear
factor model and the latent profile model. Psychometrika, (2006), 564-583.
22
Halpin, P.F. & Kieffer, M.J. (2015). Describing profiles of instructional practice: A new
approach to analyzing classroom observation data. Educational Researcher, 44(5), 263-277.
Haskins, R., & Margolis, G. (2015). Show me the evidence: Obama’s fight for rigor and results
in social policy. Washington, D.C.: Brookings Institution Press.
Henry, G. T., Bastian, K. C., & Fortner, C. K. (2011). Stayers and leavers early-career teacher
effectiveness and attrition. Educational Researcher, 40(6), 271-280.
Henry, G. T., Campbell, S. L., Thompson, C. L., Patriarca, L. A., Luterbach, K. J., Lys, D. B., &
Covington, V. M. (2013). The Predictive Validity of Measures of Teacher Candidate
Programs and Performance Toward an Evidence-Based Approach to Teacher Preparation.
Journal of Teacher Education, 64(5), 439-453.
Henry, G.T., & Bastian, K.C. (2015). Measuring up: The National Council on Teacher
Quality’s ratings of teacher preparation programs and measures of teacher performance.
Available from: https://publicpolicy.unc.edu/files/2015/07/Measuring-Up-The-National-
Council-on-Teacher-Qualitys-Ratings-of-Teacher-Preparation-Programs-and-Measures-of-
Teacher-Performance.pdf
Ladd, H. F., & Sorensen, L. C. (2014). Returns to teacher experience: Student achievement and
motivation in middle school. Center for Analysis of Longitudinal Data in Education
Research Working Paper, 112.
Ledwell, K., & Oyler, C. (2016). Unstandardized responses to a “standardized” test: The
edTPA as gatekeeper and curriculum change agent. Journal of Teacher Education, 67(2),
120-134.
Little, J.W., Gearhard, M., Curry, M., & Kafka, J. (2003). Looking at student work for teacher
learning, teacher community, and school reform. Phi Delta Kappan, 85(3), 185-192.
Lubke, G., & Neale, M. (2006). Distinguishing between latent classes and continuous factors:
Resolution by maximum likelihood. Multivariate Behavioral Research, 41, 499-532.
Mitchell, K.J., Robinson, D.Z., Plake, B.S., & Knowles, K.T. (2001). Testing teacher
candidates: The role of licensure tests in improving teacher quality. Washington, DC:
National Academy Press.
National Council on Teacher Quality. (2014). Teacher Prep Review. Available from:
http://www.nctq.org/dmsView/Teacher_Prep_Review_2014_Report
National Research Council. (2002). Scientific research in education. Washington, DC:
National Academies Press.
Pecheone, R.L. & Chung, R.R. (2006). Evidence in teacher education: The Performance
Assessment for California Teachers (PACT). Journal of Teacher Education, 57(1), 22-36.
23
Peck, C. A., & McDonald, M. (2013). Creating “cultures of evidence” in teacher education:
Context, policy, and practice in three high-data-use programs. The New Educator, 9(1), 12-
28.
Peck, C.A. & McDonald, M.A. (2014). What is a culture of evidence? How do you get one?
And…should you want one? Teachers College Record 116, 1-27.
Peck, C.A., Singer-Gabella, M., Sloan, T., & Lin, S. (2014). Driving blind: Why we need
standardized performance assessment in teacher education. Journal of Curriculum and
Instruction, 8(1), 8-30.
Preston, C. (2016). University-based teacher preparation and middle grades teacher
effectiveness. In press, Journal of Techer Education.
Ronfeldt, M., Farmer, S. O., McQueen, K., & Grissom, J. A. (2015). Teacher Collaboration in
Instructional Teams and Student Achievement. American Educational Research Journal,
52(3), 475–514.
SCALE. (2013). edTPA field test: Summary report. Available from:
https://secure.aacte.org/apps/rl/res_get.php?fid=827&ref=
Schein, E. H. (1990). Organizational culture. American Psychologist, 45(2), 109-119.
Sloan, T. (2013). Distributed leadership and organizational change: Implementation of a teaching
performance measure. The New Educator, 9(1), 29-53.
Tagg, J. (2012). Why does faculty resist change? Change: The Magazine of Higher Learning,
44(4), 6-15.
Wright, S.P., White, J.T., Sanders, W.L., & Rivers, J.C. (2010). SAS EVAAS Statistical
Models. Available from:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.437.6615&rep=rep1&type=pdf
24
Appendix
Table A.1: Factor Loadings for 2012-13 edTPA Scores
edTPA
Task edTPA Rubric Factor 1 Factor 2 Factor 3
Pla
nn
ing
Planning for Content Understanding 0.73 0.22 -0.11
Planning to Support Varied Student
Learning Needs 0.72 -0.04 0.18
Using Knowledge of Students to
Inform Teaching 0.45 0.01 0.35
Identifying and Supporting Language
Demands 0.77 -0.06 0.12
Planning Assessment to Monitor and
Support Student Learning 0.69 0.16 -0.02
Inst
ruct
ion
Learning Environment 0.20 0.61 -0.03
Engaging Students in Learning -0.01 0.82 0.04
Deepening Student Learning 0.03 0.76 0.07
Subject-Specific Pedagogy 0.01 0.61 0.19
Analyzing Teacher Effectiveness 0.03 0.18 0.62
Ass
essm
ent
Analysis of Student Learning 0.03 0.11 0.73
Providing Feedback to Guide Further
Learning 0.04 0.03 0.71
Student Use of Feedback -0.09 0.02 0.89
Analyzing Students’ Language Use 0.14 -0.02 0.72
Using Assessment to Inform
Instruction 0.13 0.03 0.73
Note: This table presents factor loadings for the 2012-13 edTPA portfolios from the Partner University. All factor
loadings greater than 0.40 are bolded.
EPIC is an interdisciplinary team that conducts rigorous research and evaluation to inform education policy and practice. We produce evidence to guide data-driven decision-making using qualitative and quantitative methodologies tailored to the target audience. By serving multiple stakeholders, including policy-makers, administrators in districts and institutions of higher education, and program implementers we strengthen the growing body of research on what works and in what context. Our work is ultimately driven by a vision of high quality and equitable education experiences for all students, and particularly
students in North Carolina.
http://publicpolicy.unc.edu/epic-home/