Download pdf - A FRAMEWORK FOR IMPROVEMENT

A FRAMEWORKFOR IMPROVEMENT:Analyzing Performance Assessment Scoresfor Evidence-Based Teacher Preparation

Program Reforms

Kevin C. Bastian, UNC-Chapel Hill

Diana Lys – UNC Chapel Hill

Yi Pan – UNC Chapel Hill

A Framework for Improvement:

Analyzing Performance Assessment Scores for

Evidence-Based Teacher Preparation Program Reforms

Kevin C. Bastian – UNC Chapel Hill

Diana Lys – UNC Chapel Hill

Yi Pan – UNC Chapel Hill

A Framework for Improvement: Analyzing Performance Assessment Scores for


Kevin C. Bastian

University of North Carolina at Chapel Hill

Education Policy Initiative at Carolina (EPIC)

Diana Lys, UNC-Chapel Hill


College of Education

Yi Pan, UNC-Chapel Hill


Frank Porter Graham Institute

Contents

Acknowledgements .......................................................................................................................... i

Abstract ........................................................................................................................................... ii

Introduction ..................................................................................................................................... 1

Theoretical Framework ................................................................................................................... 3

Cultural Historical Activity Theory ............................................................................................ 3

Teacher Education Communities ................................................................................................ 5

Teacher Candidate Performance Assessment Data ......................................................................... 6

Empirical Framework: Latent Class Analysis ............................................................................... 9

Background ................................................................................................................................. 9

Demonstrating Latent Class Analysis ....................................................................................... 10

Using Latent Class Analysis for Program Improvement .......................................................... 12

Empirical Framework: Predictive Validity Analyses .................................................................. 13

Background ............................................................................................................................... 13

Demonstrating Predictive Validity Analyses ............................................................................ 14

Using Predictive Validity Analyses for Program Improvement ............................................... 18

Discussion ..................................................................................................................................... 20

Appendix ....................................................................................................................................... 24

i

Acknowledgements

We are grateful to the faculty and staff at our Partner University for providing their edTPA data

and being enthusiastic and receptive research partners. We wish to thank the University of North

Carolina General Administration and the UNC Council of Education Deans for their feedback

and financial support as part of the Teacher Quality Research Initiative.

ii

A Framework for Improvement: Analyzing Performance Assessment Scores for


Kevin C. Bastian, UNC-Chapel Hill

Diana Lys, UNC-Chapel Hill

Yi Pan, UNC-Chapel Hill

February 2017

Abstract

Teacher candidate performance assessments represent a promising source of data for

evidence-based program improvement. However, teacher preparation programs (TPPs)

interested in reform face a crucial question: how to identify actionable evidence in performance

assessment data. To address this concern we propose a two-pronged empirical framework that

TPPs can use to analyze performance assessment data. The first approach, latent class analysis,

groups candidates together based on similarities in their performance assessment scores—

creating profiles of instructional practice. This can help TPPs provide targeted supports to

candidates. The second approach, predictive validity analyses, estimates relationships between

candidates’ performance assessment scores and their performance as teachers-of-record. This

can help TPPs identify programmatic elements significantly related to teacher outcomes. We

illustrate this framework with edTPA data from a Partner University and contend that the impact

of performance assessments can be amplified by these common strategies for analyzing

performance assessment data.

1

Introduction

In recent years the policy context surrounding teacher education has increasingly

emphasized teacher preparation programs (TPPs) using data to engage in evidence-based

program reform. Most notably, the Council for the Accreditation of Educator Preparation

(CAEP), the national TPP accreditation body, explicitly requires programs to demonstrate the

impact of their graduates in K-12 schools and to use multiple forms of data for continuous

program improvement (CAEP, 2013). To fulfill this requirement, TPPs can employ a range of

outcome measures, including teaching placement rates, teacher value-added estimates, ratings of

teachers’ instructional practices, teacher retention, and surveys of program graduates and their

employers.

Beyond outcome measures focused on the performance and persistence of program

graduates, an emerging source of data for evidence-based reforms are teacher candidate

performance assessments. Candidate performance assessments (e.g. edTPA or PPAT) are

portfolios completed by teaching candidates during their student teaching experience that

typically include video clips of instruction, lesson plans, samples of student work and teacher

feedback, and candidates’ reflective commentaries. Teacher preparation programs can use these

data to assess candidates’ readiness to enter the teaching profession and as a source of evidence

for programmatic reforms. In this context of TPP improvement, candidate performance

assessments may be particularly valuable for a number of reasons. First, performance

assessments provide timely feedback to TPPs—rather than waiting more than a year for the

evaluation ratings or value-added estimates of program graduates, performance assessment data

are accessible to TPPs prior to candidate completion. Second, performance assessments provide

TPPs with feedback on candidate performance across a range of teaching practices so that TPPs

can identify candidates’ specific strengths and weaknesses. Third, elements of performance

assessments connect to specific programmatic components, meaning TPPs can improve

outcomes by pinpointing practices to update or change based on performance assessment scores

(Diez, 2010). Lastly, as outlined by Peck and colleagues, candidate performance assessments

supply program faculty and staff with a common language, common expectations, and a forum

for accepting collective responsibility for candidate performance (Peck, Singer-Gabella, Sloan,

& Lin, 2014).

While candidate performance assessments represent a rich and promising source of data for

improvement efforts, TPPs interested in reform face a crucial question: how to best analyze

performance assessment data to identify actionable evidence. Essentially, how can programs

turn their candidate performance assessment scores into a clear direction for improvement? With

over 700 TPPs in 39 states currently using candidate performance assessments (edTPA, 2016),

there is a pressing need to answer this question and to provide a common framework for

analysis. The impact of common teaching candidate performance assessments can be amplified

by common strategies for analyzing performance assessment data. Collectively, TPPs must

2

improve the assessment literacy among their teacher education faculty (Diez, 2010) and reform

TPP practice by building cultures that engage with evidence and explore transformative program

change (Engestrom, 2001; Peck & McDonald, 2013). Therefore, in this study, we propose a

two-pronged empirical framework that TPPs can use to analyze their candidate performance

assessment scores, detail the evidence generated by these methods, and describe how this

evidence can connect to program improvement efforts.

Building from Halpin and Kieffer (2015), the first empirical strategy in our framework is

latent class analysis (LCA), an approach which groups observations together based on

similarities in their item/variable scores. Teacher preparation programs can perform LCA to

group teaching candidates together based on their performance assessment scores and then use

this classification structure to (1) predict candidates’ assignment to classes with other sources of

program data (e.g., entry characteristics, coursework performance, exposure to programmatic

components); (2) inform targeted remediation before program completion; and (3) assist school

districts in providing targeted beginning teacher supports. Quite simply, TPPs can use LCA to

both make informed remediation/intervention decisions for their current candidates and to better

understand why candidates are in certain latent classes—so that targeted intervention can occur

more quickly for future cohorts of teaching candidates.

Given the current policy context connecting teacher education to the outcomes of program

graduates, the second empirical strategy in our framework is a set of predictive validity analyses

to estimate the relationships between candidates’ performance assessment scores and their

performance (e.g. value-added estimates or evaluation ratings) as teachers-of-record. These

analyses can tell TPPs which performance assessment measures significantly predict graduate

performance. With these findings TPPs and their faculty can connect performance assessment

measures to specific elements of the program and use this evidence to better prioritize their

improvement efforts—by focusing on the programmatic components significantly associated

with graduate performance. More broadly, results from these predictive validity analyses can

help TPPs determine whether candidate performance assessments are a valuable measure on

which to base program improvement decisions.

To illustrate this two-pronged empirical approach, we partnered with a large public

university in North Carolina (hereon referred to as Partner University) to perform LCA and

predictive validity analyses on the edTPA scores of their 2012-13 graduating cohort. Here, we

stress that the purpose of these analyses is to demonstrate a common framework that TPPs can

use to analyze and act on their own candidate performance assessment data. We do not aim to

draw specific conclusions about LCA groups or which edTPA tasks/rubrics significantly predict

teacher performance. We hope that by providing examples of the methods and results TPPs will

better appreciate how this analysis framework can be a valuable tool in their program

improvement efforts.

3

In the succeeding sections we first outline the theoretical framework motivating this work

and then briefly detail performance assessments and the performance assessment data used in our

empirical framework analyses. Next, we present our LCA and predictive validity analyses by

providing background on the methods, illustrating the approach with edTPA data from our

Partner University, and detailing how TPPs can use the evidence generated by these analyses for

program improvement. Finally, we close with a discussion of how candidate performance

assessments and this empirical framework can help TPPs overcome the challenges of evidence-

based reform.

Theoretical Framework

Cultural Historical Activity Theory

To drive change in teacher preparation, it is critical to acknowledge the complexity in

which teacher education faculty, their candidates, and their partners work. Improving TPP

outcomes through evidence-based reforms is not merely a matter of having teacher performance

data; it is about how those data intersect with other elements of the teacher preparation enterprise

to improve program implementation and effectiveness. To consider this process we adopt a

widely used framework in teacher education—Engestrom’s Cultural Historical Activity Theory

(CHAT) (Ellis, Edwards, & Smagorinsky, 2010; Engestrom, 2001; Peck & McDonald, 2014;

Sloan, 2013). Specifically, we use a version of Engestrom’s model adapted by Peck and

McDonald that better aligns with evidence-based reform in teacher education (Peck &

McDonald, 2014). As shown in Figure 1, the adapted CHAT model is an interconnected

framework in which elements interact and impact the overall outcome, goal, or activity. For

example, the Rules of the system impact how members of the Community collaborate; the

Instruments or tools used by the system impact how members support one another in the

Division of Labor. Essentially, CHAT provides a framework for “understanding how change

happens in an effort to promote purposeful change” (Sloan, 2013).

4

Figure 1: An Adapted CHAT Framework for Understanding TPP Change

In the CHAT framework the unit of analysis is the activity system, not the individual

actions or components of the system. Understanding the elements of the activity system and how

they interact is critical to understanding the framework’s complexity and its ability to capture

and describe change. For instance, using the CHAT framework to analyze distributed leadership

in TPPs, Sloan demonstrated how targeted leadership actions within the TPP influenced faculty

interactions and curriculum over time (Sloan, 2013). Recognizing that the pace of change in

higher education is often slow and incremental (Schein, 1990; Tagg, 2012), CHAT provides a

powerful lens for viewing the elements of change in action, particularly those hidden actions in

support of community efforts.

For the current study, we highlight three elements in the CHAT framework—Objects,

Instrument, and Community—and consider how the interplay among these elements may create

disturbances in the TPP activity system that lead to substantive improvements in the Outcome.

In the present study the Outcome is the drive to enact evidence-based reform in teacher

preparation that leads to higher quality program graduates. As shown in Figure 2, the Objects

being used to lead to this outcome are two data analysis models—LCA and predictive validity

studies. The Instrument is the candidate performance assessment and the Community is faculty

and administrative leadership in the TPP. The activity system is the TPP itself. Importantly,

Figures 1 and 2 show that performance assessments are only one element of the activity system

and that data from these assessments must interact with multiple elements—elements often at

odds with or critical of one another—to drive learning and change in TPPs.

5

Figure 2: CHAT Elements Driving Learning in Teacher Preparation Programs

As described by Engestrom (2001), change occurs through disturbances in the activity

system that push and drive the system forward. Alternatively, activity systems can also be

stagnant and resistant to change. At the Partner University there were several key disturbances

in the system that drove change forward. The first disturbance was the availability of common

performance assessment data for all licensure programs. The second disturbance was the ability

to draw together the teacher education community for regular programmatic meetings focused on

data. The third disturbance was the “will for change” within the TPP that emanated from

program leadership. Emerging from these disturbances, we assert that learning within the system

comes from collaborative analyses to derive meaningful evidence from candidate performance

assessment data. Quite simply, TPPs need valid and reliable teacher candidate data presented in

interpretable and informative ways for teacher educator communities to digest, to question, and

to use as a guide for program improvement. In the present study our unique contribution is

highlighting the analysis methods (latent class analyses and predictive validity analyses) that will

help facilitate this change process.

Teacher Education Communities

When focusing on the relationship between TPPs and their candidates, Diez posed a

deceptively simple question: “Have they learned what we taught them?” (Diez, 2010). In PK-12

settings, such a question might be tackled by a group of teachers in a professional learning

community (PLC), with research confirming that such communities drive teacher learning and

development (Grossman, Wineburg, & Woolworth, 2001; Ronfeldt, Farmer, McQueen, Grissom,

2015). In teacher education, there are many parallels. Teacher education faculty teach students

(pre-service teachers) and are responsible for ensuring that candidates possess the knowledge and

skills to succeed as beginning teachers. Teacher education leadership guide and sustain the drive

Instruments

Teacher Perfromance Assessment Data

Community

Teacher Education Faculty

Object

Data Analysis Models

6

for change while supporting faculty development. Together, these efforts create opportunities for

faculty to unite, learn, and innovate their practices. The complexity of this work also presents

challenges for teacher education communities.

One challenge faced by teacher education communities is how to use teacher performance

data to improve candidate learning. Research linking aspects of teacher preparation to the

performance of program graduates is growing rapidly (Boyd, Grossman, Lankford, Loeb, &

Wyckoff, 2009; Henry & Bastian, 2015; Henry, Campbell, Thompson, Patriarca, Luterbach, Lys,

& Covington, 2013; Preston, 2016; Ronfeldt, 2015; Ronfeldt & Reininger, 2012); however, there

has been little focus on how TPPs respond to and use such research evidence. While some

teacher educators document their data-use to drive evidence-based improvements (Cuthrell, Lys,

Fogarty, & Dobson, 2016), many TPPs suffer from either a lack of “actionable data” (Ledwell &

Oyler, 2016) or a lack of assessment literacy among teacher education faculty (Diez, 2010). To

help build this literacy, teacher education communities need common ground upon which to

build a community of practice and learning (Grossman, Wineburg, & Woolworth, 2001). Little

and colleagues (2003) observed four conditions that facilitated the work of a teacher community

engaged in analyzing student work: (1) tools tailored to the local context; (2) the ability to talk

across and within content areas; (3) a scaffolded and supported inquiry approach; and (4) norms

and leadership to drive discussions forward. Candidate performance assessments help meet

these needs by providing common standards and language and opportunities for faculty to

engage with data in discussions that could lead to empowered teacher educator communities.

A second challenge is how to structure opportunities for faculty to engage with

performance assessment data to improve programs and teacher candidate learning. In TPPs,

teacher educator communities often default to department or program-level meetings without

structured opportunities to build assessment literacy and engage with assessment data. We

assert that evidence-based change will not occur without these key ingredients and a will to

change. Multiple TPPs have demonstrated the value of “putting the data on the table” for faculty

examination in order to build a culture of inquiry within the learning community (Cuthrell, Lys,

Fogarty, & Dobson, 2016; Peck & McDonald, 2013; Sloan, 2013). In these TPPs, leadership

played a key role in developing faculty engagement opportunities by institutionalizing

performance data exploration and allowing faculty to lead reforms to curricula and clinical

practice. By restructuring TPP practices, faculty are able to engage with data and prioritize

which activities are most meaningful for their professional practice, program development, and

ultimately, teacher candidate learning.

Teacher Candidate Performance Assessment Data

Over the past decade many teacher educators across the United States have supported the

creation and widespread adoption of teacher candidate performance assessments. These

performance assessments stem from the National Research Council’s call to develop broader and

7

more authentic assessments of teacher candidates and their performance in the classroom and are

modeled after the National Board for Professional Teaching Standards and its performance-based

framework for assessing and credentialing veteran teachers (Darling-Hammond, 2010; Mitchell,

Robinson, Plake, & Knowles, 2001). Through the completion of a student teaching portfolio that

includes lesson plans, video clips of instruction, samples of student work, and commentaries on

teaching decisions, candidate performance assessments are designed to capture a broad range of

knowledge and skills and assess whether candidates are ready to enter the teaching profession

(Pecheone & Chung, 2006). Currently, edTPA, developed by the Stanford Center for

Assessment, Learning, and Equity (SCALE), is the most widely-adopted teacher candidate

performance assessment, with over 700 TPPs in 39 states in varied stages of implementation

(edTPA, 2016).

Our Partner University began piloting the TPA (the predecessor to the current edTPA) in

selected middle and secondary grades programs in the 2010-11 academic year and expanded the

performance assessment into their elementary, music and special education programs in the

2011-12 academic year. The edTPA data used to illustrate our two-pronged empirical

framework come from the Partner University’s 2012-13 graduating cohort. In total, we have 369

edTPA scores from 13 different edTPA handbook areas (e.g. elementary literacy, secondary

history and social studies). Rather than submitting edTPA portfolios to be officially scored,

faculty and staff at the Partner University locally-evaluated the performance assessment

portfolios after participating in training facilitated by officially-calibrated faculty and following

the local evaluation protocols provided by SCALE. Our Partner University blinded scoring

assignments within content areas and did not assign university supervisors or faculty to score the

portfolios of the candidates they supervised during student teaching. While our illustrative

example uses locally-evaluated performance assessments, we stress that our proposed empirical

framework is equally valid and useful when analyzing officially-scored performance

assessments. Programs can analyze either source of performance assessment data to generate

evidence for reforms; programs with both locally and officially-scored portfolios can analyze

both sets of data to compare results and conclusions.

Table 1 displays summary statistics for the 2012-13 edTPA scores from our Partner

University. Overall, edTPA is comprised of three main tasks—Planning, Instruction, and

Assessment—with five scored rubrics within each task. Evaluators score each rubric from 1 to

5, with a 1 indicating a struggling candidate who is not ready to teach, 2 indicating a candidate

who needs more practice, 3 indicating an acceptable level of performance to begin teaching, 4

indicating a candidate with a solid foundation of knowledge and skills, and 5 indicating a highly

accomplished teacher candidate. On average, teaching candidates at the Partner University

scored between and 3.00 and 3.50 across rubrics—the average rubric score was 3.34 and the

average total score was 50.06—with higher scores in the planning and instruction tasks.

8

Table 1: edTPA Summary Statistics

edTPA

Task edTPA Rubric

Mean Score

and Standard

Deviation

Percentage

Scoring at

Level 1

Percentage

Scoring at

Level 2

Percentage

Scoring at

Level 3

Percentage

Scoring at

Level 4

Percentage

Scoring at

Level 5 P

lan

nin

g

Planning for Content Understanding 3.54

(0.83) 1.63 5.96 39.84 41.46 11.11

Planning to Support Varied Student

Learning Needs 3.43

(0.85) 2.98 6.23 44.17 38.21 8.40

Using Knowledge of Students to

Inform Teaching 3.33

(0.84) 1.63 9.76 52.30 26.83 9.49

Identifying and Supporting

Language Demands

3.31

(0.82) 2.71 8.13 51.22 31.44 6.50

Planning Assessment to Monitor and

Support Student Learning 3.44

(0.84) 2.44 5.69 46.61 35.77 9.49

Inst

ru

cti

on

Learning Environment 3.49

(0.76) 1.08 1.63 56.10 29.54 11.65

Engaging Students in Learning 3.34

(0.79) 2.98 4.34 54.20 32.25 6.23

Deepening Student Learning 3.34

(0.78) 1.63 6.50 55.83 28.18 7.86

Subject-Specific Pedagogy 3.47

(0.82) 3.52 3.52 42.55 43.36 7.05

Analyzing Teacher Effectiveness 3.25

(0.84) 2.17 11.92 52.03 26.56 7.32

Ass

ess

men

t

Analysis of Student Learning 3.33

(0.92) 5.15 6.78 46.88 32.25 8.94

Providing Feedback to Guide

Further Learning

3.38

(0.85) 4.07 7.59 38.75 45.26 4.34

Student Use of Feedback 3.06

(0.91) 5.42 16.80 49.32 23.04 5.42

Analyzing Students’ Language Use 3.09

(0.86) 4.61 13.01 56.64 20.05 5.69

Using Assessment to Inform Instruction 3.25

(0.95) 6.50 7.59 49.32 27.64 8.94

Average Rubric Score 3.34

(0.66)

Average Total Score 50.06

(9.84) Note: This table displays summary statistics—means, standard deviations, scoring distributions—for the Partner University’s 2012-13 edTPA scores.

9

Empirical Framework: Latent Class Analysis

Background

Unlike factor analysis, which identifies latent constructs comprised of items/variables,

LCA is an observation-centered approach that groups observations together based on similarities

in their item/variable scores. Specifically, LCA assigns observations to a category with other

like-scoring observations and provides an estimate of measurement error—how well did the

observation fit into the assigned category versus other categories. Considering TPPs, LCA

provides an empirical and objective basis for grouping teaching candidates together according to

their performance assessment scores. These groupings can be considered profiles of

instructional practice, with each latent class identifying a set of teaching candidates with similar

instructional strengths and shortcomings (Grossman, Loeb, Cohen, & Wyckoff, 2013). For

TPPs, the benefit of LCA is straightforward: it is a way to summarize performance assessment

data to make inferences about individual teaching candidates and diagnose strengths and

deficiencies—in candidates’ knowledge and/or skills—for further feedback and support.

Essentially, LCA offers a rigorous, quantitative approach that can help TPPs design and

implement targeted interventions to their teaching candidates (Halpin & Kieffer, 2015).

Overall, LCA entails three key advantages for TPPs. First, TPPs need very little to perform

LCA—candidates’ performance assessment scores and access to software platforms that support

the analyses (e.g. Stata, Mplus). Compared to predictive validity analyses, which require TPPs

to acquire/access data on graduates’ outcomes as teachers-of-record, this is an important

advantage for LCA. Second, LCA has an advantage of time and timing. Arguably, teacher

education faculty may conduct a similar (yet subjective) sorting exercise following a thorough

review of portfolios and scores, however, the time and labor required to conduct such a review is

considerable. LCA may be conducted with fewer time and personnel resources. Additionally,

rather than waiting for teacher outcome data (e.g. graduates’ value-added estimates), TPPs can

perform LCA as soon as performance assessment scores are available. This timing is crucial for

providing candidates with rapid feedback and support prior to program completion. Lastly, TPPs

can perform LCA on the performance assessment scores of all candidates or for smaller groups

within the preparation program—for certain licensure areas (e.g. elementary, secondary grades

mathematics) or for certain pathways (e.g. traditional undergraduate, graduate degree). This

flexibility may allow TPPs to identify distinct profiles of instructional practice within the broader

program and to better tailor interventions for candidate improvement.

Below, we describe results from LCA on our Partner University’s 2012-13 edTPA scores.

While the number and characteristics of latent classes may vary for other TPPs, our analysis

steps and the types of output they produce will be similar across TPPs undertaking LCA. Thus,

we provide an illustrative example of how TPPs can analyze their performance assessment data

to consider targeted intervention with candidates.

10

Demonstrating Latent Class Analysis

Our first empirical strategy extends the central thesis of Halpin and Kieffer (2015)—that

LCA of teachers’ observation scores can support targeted professional development

interventions—to teaching candidates and their performance assessment scores. Here, an initial

decision for TPPs to make is whether to perform LCA on all of the performance assessment

rubrics, in the same model, or to perform LCA on the main performance assessment domains

(e.g. planning, instruction, and assessment), separately. For this illustrative case, we estimate a

single LCA including all 15 edTPA rubrics; TPPs focused on certain domains of a candidate

performance assessment may choose to perform domain-specific LCA.

The first step in LCA is identifying how many latent classes exist in the data. This process

begins by estimating models with different numbers of latent classes. Next, the results of these

models are compared and the final number of latent classes is determined by goodness of fit

statistics (Lubke & Neale, 2006). Essentially, this process identifies which scenario—which

number of latent classes—best fits the data. For our Partner University’s edTPA scores, multiple

goodness of fit criteria—AIC, BIC, -2 log likelihood—identify four latent classes (four profiles

of candidate instructional practice). This result is consistent with work showing that an

instrument with three factors—see Appendix Table 1 for factor analysis results of Partner

University’s edTPA data—typically equates to four latent classes (Halpin, Dolan, Grasman, &

De Boeck, 2011).

After determining the number of latent classes, the next analysis step is assessing how well

observations fit in their assigned class versus another class. For TPPs, this is key to the

interpretation and use of LCA results: if teaching candidates do not fit well into their assigned

class then targeted interventions, to address deficiencies in candidates’ practice, may not be

appropriate for those candidates. Latent class analysis estimates the probability of observations

being in each class and from this, the average profile membership score—observations’

probability of being in their assigned latent class—can be calculated. The average profile

membership score for our Partner University’s teaching candidates was 0.963, indicating that

nearly all teaching candidates had a high probability of being assigned to only one latent class.

Overall, 52 percent of the teaching candidates had a profile membership score of one; 88 percent

of the teaching candidates had a profile membership score of 0.90 or higher.

11

Figure 3: Average edTPA Rubric Scores for each Profile of Instructional Practice

Note: This figure displays the average edTPA rubric scores for Partner University candidates in Profiles A, B, C,

and D.

To push towards actionable-evidence from LCA, a final step is characterizing and assessing

differences in the identified classes. Figure 3 depicts the average edTPA rubric scores for the

four profiles of instructional practice at Partner University. Here, the four latent classes can be

characterized as follows: (1) Profile A, a high scoring group comprised of 54 candidates whose

average rubric score is 4.38 and whose average total score is 65.63; (2) Profile B, a middle-high

scoring group comprised of 143 candidates whose average rubric score is 3.56 and whose

average total score is 53.45; (3) Profile C, a middle-low scoring group comprised of 128

candidates whose average rubric score is 3.05 and whose average total score is 45.70; and (4)

Profile D, a low scoring group comprised of 44 candidates whose average rubric score is 2.18

and whose average total score is 32.64. To provide Partner University with further evidence on

these profiles, we tested whether there were statistically significant differences in the edTPA

rubric scores for each adjacent pair of groups. For all 15 edTPA rubrics, there were significant

scoring differences between Profile A vs. Profile B, Profile B vs. Profile C, and Profile C vs.

Profile D. Within profiles edTPA scores are relatively stable. Rather than scoring high on some

elements of edTPA and low on others, teaching candidates in each profile perform fairly

consistently across rubrics (except for Profile D’s drop in scores for the edTPA Assessment

task). While this suggests that TPPs may be able to classify candidates without formal analysis

methods, we contend that (1) LCA is a more empirical and objective classification approach and

1

1.5

2

2.5

3

3.5

4

4.5

5

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15

edT

PA

Ru

bri

c S

core

s

Profile A Profile B Profile C Profile D

12

(2) profiles of instructional practice from other TPPs may not resemble (may be non-parallel)

those of our Partner University. Below, we describe how TPPs can leverage LCA results to

improve candidate and program outcomes.

Using Latent Class Analysis for Program Improvement

An advantage of LCA is the opportunity to use it at multiple intervention points for

candidate and program improvement. At present, we have identified three opportunities for

TPPs to use LCA results; new usages may emerge as more TPPs adopt the analysis approach.

The first opportunity involves the development of profiles of instructional practice by joining

LCA results with other TPP data (e.g. selection criteria, course-taking, dispositional and

observation ratings). This can help TPPs intervene with successive cohorts of teaching

candidates at earlier stages of the program. The next opportunity is for teacher candidate

remediation and reinforcement prior to program completion. The final opportunity follows

program graduates into the field, using LCA results to support successful teacher induction.

The primary use of LCA is the opportunity to develop profiles of instructional practice

from which TPPs can design and implement targeted supports for teacher candidates (Halpin &

Kieffer, 2015). Specifically, LCA results identify groups of candidates who excel and struggle

with particular teaching tasks; by combining these LCA results with other demographic and

program data, TPPs can create more robust profiles and predict the types/characteristics of

candidates in each profile (Ledwell & Oyler, 2016). With these data “on the table”, faculty

leadership can target teaching candidates that are comparable to those in established profiles for

intervention at earlier stages of the program. For example, future cohorts of teaching candidates

who fit the Group D profile (see Figure 3) could be targeted for early intervention, particularly

for program elements related to the edTPA Assessment domain. Conversely, future cohorts of

teacher candidates who fit the Group A profile might be engaged in additional teacher leadership

development opportunities. Essentially, such profiles of instructional practice offer faculty the

opportunity to target their efforts, to focus on the specific needs of candidates, and to ground

their actions in evidence (Sloan, 2013).

The next opportunity for LCA use is remediation and reinforcement of current teacher

candidates prior to entering the classroom. As noted previously, a potential advantage of LCA is

timing. Although the timing is tight, TPPs can conduct LCA on candidate performance

assessment scores prior to the end of the semester and provide focused remediation and support

before graduation or as part of a “summerbridge” activity prior to entering the classroom.1 For

example, Partner University’s candidates in Profile D, who struggle most with the Assessment

task, would benefit from additional opportunities to analyze assessment data and use those data

to inform instruction. Using LCA in this way allows TPPs to model providing data-driven

1 For these remediation and reinforcement purposes, a potential benefit to local-scoring, in comparison to official

scoring, is written, formative feedback on candidate performance that can aid remediation efforts (Ledwell & Oyler,

2016).

13

feedback to candidates and allows candidates the opportunity to use feedback to guide further

learning.

A final opportunity is the use of LCA for collaboratively developed induction plans that

follow candidates into their first teaching position to support their professional growth.

Nationally, TPPs are encouraged and expected to work with partner school districts to support

beginning teachers in the initial stages of their careers (CAEP, 2013). The profiles of

instructional practice yielded by LCA can help TPP faculty and hiring districts/schools to

collaboratively customize induction and professional growth plans based on individual teachers’

needs—rather than a one size fits all program. In this way, LCA may present new opportunities

for TPPs to strengthen partnerships with school districts, with a common focus on beginning

teacher support, retention, and success. Such activities may prove valuable for TPP-district

partnerships, teacher development, and TPP accreditation.

Empirical Framework: Predictive Validity Analyses

Background

While LCA can be a valuable tool that informs TPPs’ intervention with candidates and

facilitates research to better understand candidates’ performance assessment scores, it is silent

regarding the relationships between candidate performance assessments and outcomes for

teachers-of-record. Therefore, the second approach in our empirical framework is a set of

predictive validity analyses to estimate the associations between performance assessment

measures and indicators of beginning teacher performance. This approach is in line with the

current outcomes-driven policy environment and can help TPPs prioritize improvement efforts

focused on performance assessment measures (teaching competencies) that are significantly

associated with teacher effectiveness. More broadly, results from these predictive validity

analyses can help TPPs determine whether performance assessments are a valuable measure on

which to base program improvement decisions.

To perform predictive validity analyses, TPPs need access to individual-level indicators of

beginning/early-career teacher performance. Examples of such indicators include teacher value-

added estimates, ratings of teachers’ instructional practices, or student surveys. When possible,

it is beneficial for TPPs to assess the predictive validity of candidate performance assessment

scores with multiple teacher outcomes, as this provides a more comprehensive accounting of

teacher performance, allows for a larger analysis sample,2 and lets TPPs examine relationships

with specific teaching competencies. Unlike LCA, which can be performed by any TPP, we

acknowledge that predictive validity analyses entail more challenges for TPPs. Primarily, TPPs

need access to indicators of teacher performance, and in certain states these may not exist or may

not be available to TPPs. Even when accessible, the small size of many TPPs may make it

2 In North Carolina, for example, approximately 35-40 percent of teachers have a value-added estimate while nearly

90 percent of teachers have evaluation ratings from their school principal.

14

difficult to estimate relationships between performance assessment scores and teacher

effectiveness. Addressing these challenges may require TPPs to form partnerships—with state

(local) education agencies or outside researchers to access data or perform predictive validity

analyses and with other TPPs to pool data and increase sample size. These teacher performance

data may soon be more accessible to TPPs as evidence-based program accountability and

improvement becomes a higher policy and program priority.

Below, we describe the measures and methods we use to demonstrate our predictive

validity analyses. While specific analysis details will vary for other TPPs, depending upon the

data available to them, this provides a framework for an approach that TPPs can take to

determine what performance assessment measures predict graduate performance.

Demonstrating Predictive Validity Analyses

To examine the predictive validity of our Partner University’s 2012-13 edTPA scores, we

use two teacher outcomes: value-added estimates and evaluation ratings. For value-added, we

use teachers’ Education Value-Added Assessment System (EVAAS) scores, the official measure

of value-added used for teacher evaluation in North Carolina public schools (NCPS) (Wright,

White, Sanders, & Rivers, 2010). We focus on teacher EVAAS estimates in elementary and

middle grades mathematics, reading, and science; middle grades social studies; and across a

range of high school math, science, English, history, and civics courses. To ease interpretability

of the EVAAS estimates, which are expressed in either normal curve equivalency units or scale

score points, we standardized the EVAAS scores, within test (e.g. 4th grade mathematics, 7th

grade reading, U.S. history), across all NCPS teachers with EVAAS estimates. This allows us to

interpret coefficients as the relationship between a particular edTPA measure and a percentage of

a standard deviation in teacher effectiveness. Since the outcome variable in these analyses is a

normally distributed measure of teacher value-added, we estimate ordinary least squares (OLS)

regression models and control for a limited set of covariates to better isolate the associations

between edTPA measures and teacher value-added.3 Overall, the sample for these analyses

includes 209 value-added estimates for 152 first-year teachers in the 2013-14 school year.

For evaluation ratings, we use teachers’ ratings on the North Carolina Educator Evaluation

System (NCEES), a statewide evaluation rubric in which school administrators rate teachers

across five standards: teachers demonstrate leadership (Standard 1); teachers establish a

respectful environment for a diverse group of students (Standard 2); teachers know the content

they teach (Standard 3); teachers facilitate learning for their students (Standard 4); and teachers

reflect on their practice (Standard 5). To evaluate teachers, school administrators use formal

classroom observations and paper-based evidences to document key indicators of practice and

rate teachers as either not demonstrated (level 1), developing (level 2), proficient (level 3),

3 In both our value-added and evaluation rating analyses we control for the school-level percentage of students

qualifying for subsidized meals and the school-level percentage of racial/ethnic minority students. In our value-

added models we also include a set of test-area indicators to net out value-added differences across tests.

15

accomplished (level 4), or distinguished (level 5) on each of the five NCEES standards. Since

the outcome variable in these analyses is an ordinal (1-5) evaluation rating, we estimate ordered

logistic regression models and control for a limited set of covariates to better isolate the

relationships between edTPA measures and teacher evaluation ratings. Overall, the sample for

these analyses includes 235 first-year teachers who were evaluated by a school administrator in

the 2013-14 school year.

To increase the utility of these predictive validity analyses, we estimate three types of

models. Our first approach estimates the relationships between teacher outcomes and constructs

derived from factor analysis of the edTPA scores (please see Appendix Table 1 for the factor

loadings).4 These results let TPPs know whether constructs of broad teaching tasks impact

teacher performance. Our second approach estimates relationships between teacher outcomes

and each edTPA rubric (entered individually into models). These results may be particularly

valuable, as they help TPPs connect specific edTPA rubrics and the knowledge and skills

underlying those rubrics to components of the program. Finally, our third approach estimates

relationships between teacher outcomes and a standardized total score across all 15 edTPA

rubrics. These results provide a holistic assessment of whether higher edTPA scores predict

teacher performance.5

Tables 2 and 3 display predictive validity results from our value-added and evaluation

rating models. As previously stated, we do not intend to make generalizable conclusions about

the predictive validity of edTPA scores based on these results; rather, we hope to discuss the

findings in such a way that TPPs can gain insight into interpreting results for their own program

improvement purposes. Overall, Table 2 shows that the Instruction and Assessment factors are

significantly associated with teacher value-added estimates—a one standard deviation increase in

these constructs led to a 21 and 18 percent of a standard deviation increase in teacher value-

added. To put these results into perspective, the average difference between first-year and

second-year teachers in NCPS is 23 percent of a standard deviation. At the edTPA rubric level,

four instruction rubrics (rubrics 6-9) and four assessment rubrics (rubrics 12-15) significantly

predict higher value-added estimates. For example, a one-point increase in a candidate’s score

on the “Engaging Students in Learning” rubric is associated with a 27 percent of a standard

deviation increase in teacher value-added. Lastly, the standardized total score significantly

predicts higher value-added—a one standard deviation increase in the total score, equivalent to

approximately 10 edTPA points in our sample, is associated with an 18 percent of a standard

deviation increase in teacher value-added.

4 These factor analysis results closely resemble the three main edTPA tasks of Planning, Instruction, and Assessment

and fully reproduce the factor analysis results from the 2013 edTPA field test report (SCALE, 2013). 5 Teacher preparation programs can also estimate predictive validity models to determine whether outcomes differ

for graduates in different latent classes (Halpin & Kieffer, 2015). For example, what are the performance

differences for graduates in Profile A versus Profile B?

16

Table 2: Predictive Validity with Teacher Value-Added Estimates

edTPA Measures Standard EVAAS

Estimate

Planning factor 0.063

(0.067)

Instruction factor 0.213**

(0.060)

Assessment factor 0.178*

(0.068)

Planning for Content Understanding 0.121

(0.083)

Planning to Support Varied Student Learning Needs 0.050

(0.095)

Using Knowledge of Students to Inform Teaching 0.029

(0.098)

Identifying and Supporting Language Demands 0.092

(0.076)


Support Student Learning 0.102

(0.082)

Learning Environment 0.156+

(0.090)

Engaging Students in Learning 0.271**

(0.081)

Deepening Student Learning 0.210**

(0.079)

Subject-Specific Pedagogy 0.229*

(0.088)

Analyzing Teacher Effectiveness 0.134

(0.086)


(0.079)

Providing Feedback to Guide Further Learning 0.194**

(0.074)

Student Use of Feedback 0.195*

(0.087)

Analyzing Students’ Language Use 0.215*

(0.082)

Using Assessment to Inform Instruction 0.154+

(0.080)

Standardized Total Score 0.184**

(0.068)

Cases 209

Note: This table displays coefficients for the relationship between edTPA data and teacher value-added estimates.

Standard errors are in parentheses. +, *, and ** indicate statistical significance at the 0.10, 0.05, and 0.01 levels.

17

Table 3: Predictive Validity with Teacher Evaluation Ratings

edTPA Measures Leadership Classroom

Environment

Content

Knowledge

Facilitating

Student

Learning

Reflecting

on

Practice

Planning factor 1.031

(0.876)

0.968

(0.870)

0.994

(0.975)

0.856

(0.341)

1.072

(0.681)

Instruction factor 1.161

(0.372)

1.213

(0.241)

1.093

(0.618)

0.987

(0.928)

1.195

(0.244)

Assessment factor 1.070

(0.742)

1.062

(0.745)

1.021

(0.901)

0.983

(0.912)

1.034

(0.838)

Planning for Content

Understanding

0.968

(0.887)

0.906

(0.610)

0.938

(0.767)

0.870

(0.485)

1.113

(0.589)

Planning to Support Varied

Student Learning Needs

1.239

(0.360)

1.079

(0.749)

1.017

(0.935)

0.889

(0.522)

1.149

(0.499)

Using Knowledge of

Students to Inform Teaching

0.945

(0.809)

1.039

(0.870)

0.975

(0.908)

0.780

(0.211)

0.981

(0.936)

Identifying and Supporting

Language Demands

0.986

(0.955)

0.937

(0.787)

0.920

(0.718)

0.780

(0.185)

0.918

(0.700)

Planning Assessment to

Monitor and

Support Student Learning

1.123

(0.622)

1.058

(0.819)

1.159

(0.545)

0.977

(0.913)

1.317

(0.199)

Learning Environment 0.998

(0.993)

1.214

(0.419)

1.263

(0.306)

0.926

(0.693)

1.161

(0.496)

Engaging Students in

Learning

1.184

(0.452)

1.187

(0.482)

1.058

(0.815)

0.922

(0.709)

1.334

(0.180)

Deepening Student Learning 1.364

(0.175)

1.378

(0.102)

1.094

(0.683)

0.987

(0.940)

1.148

(0.492)

Subject-Specific Pedagogy 1.159

(0.487)

1.229

(0.323)

1.073

(0.757)

1.022

(0.914)

1.176

(0.404)

Analyzing Teacher

Effectiveness

1.028

(0.898)

0.973

(0.896)

1.001

(0.996)

1.096

(0.625)

1.019

(0.921)


(0.816)

1.010

(0.963)

0.956

(0.834)

0.951

(0.793)

0.988

(0.958)

Providing Feedback to Guide

Further Learning

1.193

(0.431)

1.211

(0.354)

1.152

(0.465)

1.197

(0.353)

1.055

(0.793)

Student Use of Feedback 1.093

(0.663)

1.128

(0.526)

1.039

(0.819)

0.976

(0.870)

1.184

(0.352)

Analyzing Students’

Language Use

1.040

(0.861)

1.019

(0.922)

1.143

(0.480)

0.955

(0.793)

1.079

(0.680)

Using Assessment to Inform

Instruction

1.021

(0.929)

1.019

(0.922)

0.948

(0.774)

0.888

(0.496)

0.992

(0.967)

Standardized Total Score 1.103

(0.656)

1.098

(0.667)

1.049

(0.816)

0.932

(0.682)

1.114

(0.544)

Cases 235 235 235 235 235

Note: This table displays odds ratios for the relationship between edTPA data and teacher evaluation ratings.

P-values are in parentheses. +, *, and ** indicate statistical significance at the 0.10, 0.05, and 0.01 levels.

18

Ideally, these edTPA measures would significantly predict multiple measures of teacher

performance—as this consistency provides clearer direction for program improvement efforts—

however, Table 3 indicates that there are no significant relationships between edTPA scores and

teacher evaluation ratings for Partner University’s 2012-13 graduates. Given the robust value-

added findings, there are two reasonable explanations for these null results: (1) the lack of

variation in evaluation ratings—approximately 75 percent of the sample earned ratings of

proficient (level 3) and/or (2) the use of local edTPA scores, which may be less reliable than

officially scored edTPA portfolios.6 In predictive validity analyses, other measures of teacher

performance—student surveys or rubric-based observation protocols (e.g. CLASS, Framework

for Teaching)—may better complement teacher value-added estimates. Below, we discuss how

TPPs can use the evidence generated by predictive validity analyses for program improvement.

Using Predictive Validity Analyses for Program Improvement

As previously noted, access to the data needed to complete predictive validity analyses is a

significant limitation for many TPPs and their state partners. However, as data access

collaborations increase across P-20, the field should anticipate increased use of such analyses in

the future. Therefore, the second prong of our empirical framework is more than an approach for

analyzing performance assessment data to aid program improvement; it also shows TPPs the

need to advocate and collaborate for increased data access and use.

The primary way that TPPs can use predictive validity analyses for improvement is to link

predictive validity results back to programmatic elements. Here, predictive validity analyses

using the total score or domain level data may be beneficial, however, rubric level analyses

provide TPP faculty with the most nuanced data to link to programmatic features. Specifically,

rubric level analyses may help TPPs link edTPA rubrics that significantly predict graduate

performance to course objectives and objectives developed across course sequences. By

identifying these linkages—essentially building maps between pre-service curricula and

components, candidate performance assessments, and in-service teacher outcomes—TPPs can

identify programmatic elements to emphasize and those to potentially reform. For example, if

higher scores on the Engaging Students in Learning rubric predict significantly higher value-

added estimates, TPPs can (1) strengthen the course objectives and programmatic components

tied to this rubric and (2) identify which candidates scored high (low) on this rubric. By

identifying these candidates, TPPs can better determine why candidates score well or poorly

(using other program data) and implement evidence-based reforms to improve practice.

Predictive validity analyses not only help faculty to target certain domains and rubrics

found to be significantly linked to teacher performance, but also, to target changes in their own

6 In prior analyses examining the predictive validity of locally-evaluated TPA data (from Partner University’s 2011-

12 graduating cohort), we found that performance assessment measures significantly predicted higher evaluation

ratings (Bastian, Henry, Pan, & Lys, 2016). Likewise, in work with officially-scored edTPA data (from Partner

University’s 2013-14 graduating cohort), we found that performance assessment measures significantly predicted

higher evaluation ratings (Bastian & Lys, 2016).

19

teaching practice. Essentially, predictive validity analyses can create disturbances in the system

that push the teacher education community and individual/groups of faculty members forward in

program improvement efforts (Engestrom, 2001). Since most TPPs lack the time and resources

to respond to predictive validity analyses with widespread reforms, faculty must consider the

analyses (and other program data) in conjunction with their ability to enact meaningful change

within their program and the courses they teach. For example, at Partner University, a pair of

faculty were particularly concerned when predictive validity value-added results confirmed their

concerns over a trend of low scores on rubrics 12 and 13 in the edTPA Assessment domain. The

faculty engaged in deep reflection on the edTPA scores, value-added results, and their own

teaching practices, leading to learning and instructional changes within their community.

Specifically, the faculty developed a new instructional model that included structured

opportunities for candidates to receive feedback from faculty, engage in peer feedback

exchanges, and then act upon the collective feedback. This feedback modeling activity increased

the focus on instructional practice and analysis of teaching, enhanced collegiality among the

teacher candidates, and ultimately, led to higher scores on edTPA rubrics 12 and 13.

Beyond assessing which programmatic elements and teaching practices predict graduate

performance, TPPs can use predictive validity analyses to help evaluate the success of TPP

reforms. For example, prior to undertaking predictive validity analyses, Partner University

adopted a focus on instructional strategy development as part of its Teacher Quality Partnership

grant. This led Partner University to create course-embedded instructional strategy modules that

required candidates to demonstrate their declarative, procedural, and conditional knowledge of

each strategy (Carson, Cuthrell, Smith, & Stapleton, 2010). As a summative program

assessment, edTPA scores provided valuable data about the impact of this reform on candidates;

predictive validity results provided evidence about the relationships between graduates’

instruction and student learning. Specifically, significant findings in the predictive validity

analyses between teacher value-added and both the edTPA Instruction task and edTPA rubrics 6-

9 suggested that the TPP focus on instructional strategies was having a positive impact on

teacher outcomes.

Finally, results from predictive validity analyses can help TPPs determine whether

candidate performance assessments are a valuable measure on which to base program

improvement decisions. If performance assessment measures do not predict beginning teacher

outcomes—e.g. value-added, evaluation ratings, student surveys—then the evidence provided by

performance assessments may not guide TPPs to adopt more effective preparation practices.

Therefore, a lack of predictive validity between performance assessment scores and teacher

outcomes should encourage TPPs to assess the conceptual alignment between performance

assessment measures (tasks, rubrics) and teacher outcomes, examine the reliability of

performance assessment scoring (if local evaluation is used), and consider additional

measures/instruments that may provide better evidence for program improvement.

20

Discussion

Teacher education currently resides within a broader policy context focused on data and

evidence-based reform (Haskins & Margolis, 2015; National Research Council, 2002). The goal

of this initiative is straightforward: to improve processes and outcomes through data-driven

decision making. Central to this initiative are three key assumptions: (1) the presence of timely,

valid, and reliable data; (2) an ability to identify actionable evidence within this body of data;

and (3) the capacity and will to act on evidence for improvement (Peck & McDonald, 2014).

From the perspective of teacher education, candidate performance assessments help address

the first assumption. Performance assessment scores are readily available to TPPs and research

shows that performance assessment scores are reliable (SCALE, 2013) and predictive of

beginning teacher outcomes (Bastian, Henry, Pan, Lys, 2016; Bastian & Lys, 2016; Goldhaber,

Cowan, & Theobald, 2016). Building from these results, the main contribution of the present

study addresses the second assumption: proposing a two-pronged empirical framework that

TPPs can use to derive actionable evidence from their performance assessment data. Here, we

recognize the potential advantages of collective action—the impact of common candidate

performance assessments can be amplified by common strategies for analyzing performance

assessment data. As more TPPs adopt this framework, communities of programs and teacher

educators can develop to collaboratively improve practice and push for greater access to teacher

outcomes data. Although there are limitations—tight time windows to receive performance

assessment scores and perform LCA, access to teacher outcomes data—we believe that this

empirical framework has the potential to cause disturbances in the TPP activity system and

promote learning and change in teacher education communities (Engestrom, 2001). Furthermore,

we note the widespread applicability of this framework—while we promote its use with

candidate performance assessments, TPPs can also use it with other performance instruments

(e.g. dispositional ratings, observation scores).

Latent class analysis of performance assessment scores can help TPPs organize, prioritize,

and target interventions to benefit their current teaching candidates and future cohorts of

candidates. Getting the most out of LCA will push TPPs to improve their data management

systems, develop new learning modules for candidate remediation and practice, and strengthen

relationships with surrounding schools/districts to better connect teacher education to beginning

teacher support. Predictive validity analyses can provide a compass for the LCA results, helping

TPPs identify which programmatic elements predict beginning teacher performance and

prioritize reforms that are connected to teacher outcomes. To best utilize predictive validity

results, TPPs must build maps linking pre-service curricula and components to performance

assessments and in-service teacher outcomes. Overall, these empirical frameworks offer TPPs

strategies to turn performance assessment data into evidence and may help TPPs develop the

collective capacity and will to turn evidence into improvement.

21

References

Bastian, K.C., Henry, G.T., Yi, P., & Lys, D. (2016). Teacher candidate performance

assessments: Local scoring and implications for teacher preparation program improvement.

Teaching and Teacher Education, 59, 1-12.

Boyd, D. J., Grossman, P. L., Lankford, H., Loeb, S., & Wyckoff, J. (2009). Teacher preparation

and student achievement. Educational Evaluation and Policy Analysis, 31(4), 416-440.

Carson, J., Cuthrell, K., Smith, J., & Stapleton, J. (2010, February). Helping preservice teachers

choose research-based instructional strategies. Paper session presented at AACTE Annual

Meeting and Exhibits, San Diego, CA.

Council for the Accreditation of Educator Preparation. (2013). CAEP Accreditation Standards.

Available from: http://caepnet.files.wordpress.com/2013/09/final_board_approved1.pdf

Cuthrell, K.C., Lys, D.B., Fogarty, E.A., & Dobson, E.E. (2016). Using edTPA data to improve

programs. Evaluating Teacher Education Programs Through Performance-Based

Assessments, 67.

Diez, M. E. (2010). It is complicated: Unpacking the flow of teacher education’s impact on

student learning. Journal of Teacher Education, 61(5), 441-450.

edTPA. (2015). Educative assessment and meaningful support: 2014 edTPA administrative

report. Available from: https://secure.aacte.org/apps/rl/resource.php?resid=558&ref=edtpa

Ellis, V., Edwards, A., & Smagorinsky, P. (Eds.). (2010). Cultural-historical perspectives on

teacher education and development: Learning teaching. Routledge.

Engeström, Y. (2001). Expansive learning at work: Toward an activity theoretical

reconceptualization. Journal of Education and Work, 14(1), 133-156.

Goldhaber, D., Cowan, J., & Theobald, R. (2016). Evaluating prospective teachers: Testing the

predictive validity of the edTPA. Center for Analysis of Longitudinal Data in Education

Research Working Paper, 157.

Grossman, P., Wineburg, S.S., & Woolworth, S. (2001). Toward a theory of teacher community.

Teachers College Record, 103(6), 942-1012.

Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The

relationship between measures of instructional practice in middle school English language

arts and teachers’ value-added scores. American Journal of Education, 119, 445-470.

Halpin, P., Dolan, C., Grasman, R., & De Boeck, P. (2011). On the relation between the linear

factor model and the latent profile model. Psychometrika, (2006), 564-583.

http://caepnet.files.wordpress.com/2013/09/final_board_approved1.pdf

https://secure.aacte.org/apps/rl/resource.php?resid=558&ref=edtpa

22

Halpin, P.F. & Kieffer, M.J. (2015). Describing profiles of instructional practice: A new

approach to analyzing classroom observation data. Educational Researcher, 44(5), 263-277.

Haskins, R., & Margolis, G. (2015). Show me the evidence: Obama’s fight for rigor and results

in social policy. Washington, D.C.: Brookings Institution Press.

Henry, G. T., Bastian, K. C., & Fortner, C. K. (2011). Stayers and leavers early-career teacher

effectiveness and attrition. Educational Researcher, 40(6), 271-280.

Henry, G. T., Campbell, S. L., Thompson, C. L., Patriarca, L. A., Luterbach, K. J., Lys, D. B., &

Covington, V. M. (2013). The Predictive Validity of Measures of Teacher Candidate

Programs and Performance Toward an Evidence-Based Approach to Teacher Preparation.

Journal of Teacher Education, 64(5), 439-453.

Henry, G.T., & Bastian, K.C. (2015). Measuring up: The National Council on Teacher

Quality’s ratings of teacher preparation programs and measures of teacher performance.

Available from: https://publicpolicy.unc.edu/files/2015/07/Measuring-Up-The-National-

Council-on-Teacher-Qualitys-Ratings-of-Teacher-Preparation-Programs-and-Measures-of-

Teacher-Performance.pdf

Ladd, H. F., & Sorensen, L. C. (2014). Returns to teacher experience: Student achievement and

motivation in middle school. Center for Analysis of Longitudinal Data in Education

Research Working Paper, 112.

Ledwell, K., & Oyler, C. (2016). Unstandardized responses to a “standardized” test: The

edTPA as gatekeeper and curriculum change agent. Journal of Teacher Education, 67(2),

120-134.

Little, J.W., Gearhard, M., Curry, M., & Kafka, J. (2003). Looking at student work for teacher

learning, teacher community, and school reform. Phi Delta Kappan, 85(3), 185-192.

Lubke, G., & Neale, M. (2006). Distinguishing between latent classes and continuous factors:

Resolution by maximum likelihood. Multivariate Behavioral Research, 41, 499-532.

Mitchell, K.J., Robinson, D.Z., Plake, B.S., & Knowles, K.T. (2001). Testing teacher

candidates: The role of licensure tests in improving teacher quality. Washington, DC:

National Academy Press.

National Council on Teacher Quality. (2014). Teacher Prep Review. Available from:

http://www.nctq.org/dmsView/Teacher_Prep_Review_2014_Report

National Research Council. (2002). Scientific research in education. Washington, DC:

National Academies Press.

Pecheone, R.L. & Chung, R.R. (2006). Evidence in teacher education: The Performance

Assessment for California Teachers (PACT). Journal of Teacher Education, 57(1), 22-36.

https://publicpolicy.unc.edu/files/2015/07/Measuring-Up-The-National-Council-on-Teacher-Qualitys-Ratings-of-Teacher-Preparation-Programs-and-Measures-of-Teacher-Performance.pdf



http://www.nctq.org/dmsView/Teacher_Prep_Review_2014_Report

23

Peck, C. A., & McDonald, M. (2013). Creating “cultures of evidence” in teacher education:

Context, policy, and practice in three high-data-use programs. The New Educator, 9(1), 12-

28.

Peck, C.A. & McDonald, M.A. (2014). What is a culture of evidence? How do you get one?

And…should you want one? Teachers College Record 116, 1-27.

Peck, C.A., Singer-Gabella, M., Sloan, T., & Lin, S. (2014). Driving blind: Why we need

standardized performance assessment in teacher education. Journal of Curriculum and

Instruction, 8(1), 8-30.

Preston, C. (2016). University-based teacher preparation and middle grades teacher

effectiveness. In press, Journal of Techer Education.

Ronfeldt, M., Farmer, S. O., McQueen, K., & Grissom, J. A. (2015). Teacher Collaboration in

Instructional Teams and Student Achievement. American Educational Research Journal,

52(3), 475–514.

SCALE. (2013). edTPA field test: Summary report. Available from:

https://secure.aacte.org/apps/rl/res_get.php?fid=827&ref=

Schein, E. H. (1990). Organizational culture. American Psychologist, 45(2), 109-119.

Sloan, T. (2013). Distributed leadership and organizational change: Implementation of a teaching

performance measure. The New Educator, 9(1), 29-53.

Tagg, J. (2012). Why does faculty resist change? Change: The Magazine of Higher Learning,

44(4), 6-15.

Wright, S.P., White, J.T., Sanders, W.L., & Rivers, J.C. (2010). SAS EVAAS Statistical

Models. Available from:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.437.6615&rep=rep1&type=pdf

https://secure.aacte.org/apps/rl/res_get.php?fid=827&ref

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.437.6615&rep=rep1&type=pdf

24

Appendix

Table A.1: Factor Loadings for 2012-13 edTPA Scores

edTPA

Task edTPA Rubric Factor 1 Factor 2 Factor 3

Pla

nn

ing

Planning for Content Understanding 0.73 0.22 -0.11

Planning to Support Varied Student

Learning Needs 0.72 -0.04 0.18

Using Knowledge of Students to

Inform Teaching 0.45 0.01 0.35

Identifying and Supporting Language

Demands 0.77 -0.06 0.12


Support Student Learning 0.69 0.16 -0.02

Inst

ruct

ion

Learning Environment 0.20 0.61 -0.03

Engaging Students in Learning -0.01 0.82 0.04

Deepening Student Learning 0.03 0.76 0.07

Subject-Specific Pedagogy 0.01 0.61 0.19

Analyzing Teacher Effectiveness 0.03 0.18 0.62

Ass

essm

ent

Analysis of Student Learning 0.03 0.11 0.73

Providing Feedback to Guide Further

Learning 0.04 0.03 0.71

Student Use of Feedback -0.09 0.02 0.89

Analyzing Students’ Language Use 0.14 -0.02 0.72

Using Assessment to Inform

Instruction 0.13 0.03 0.73

Note: This table presents factor loadings for the 2012-13 edTPA portfolios from the Partner University. All factor

loadings greater than 0.40 are bolded.

EPIC is an interdisciplinary team that conducts rigorous research and evaluation to inform education policy and practice. We produce evidence to guide data-driven decision-making using qualitative and quantitative methodologies tailored to the target audience. By serving multiple stakeholders, including policy-makers, administrators in districts and institutions of higher education, and program implementers we strengthen the growing body of research on what works and in what context. Our work is ultimately driven by a vision of high quality and equitable education experiences for all students, and particularly

students in North Carolina.

http://publicpolicy.unc.edu/epic-home/