Download ppt - What kinds of assessment support learning of key competences?

www.ioe.ac.uk

What kinds of assessment support learning of key competences?

Dylan Wiliam

EC seminar on the assessment of key competences

Brussels, Belgium, 15 October 2009

www.dylanwiliam.net

Overview of presentationFunctions of assessmentEvaluativeSummativeFormative

Validity and the consequences of assessment

Formative assessment

Designing systems for assessing key competences

Functions of assessmentEvaluative (E)For evaluating institutions, curricula and organizations

Summative (S)For describing individuals

Formative (F)For supporting learning

Examples of assessment systemsENAEP, “No Child Left Behind”

SBaccalaureat, Abitur, Matura

E+SGCSE (England)

E+S+FNational Curriculum Assessment (England)

5

ValidityValidity is a property of inferences, not of assessments

“One validates, not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971; emphasis in original)

The phrase “A valid test” is therefore a category error (like “A happy rock”) No such thing as a valid (or indeed invalid) assessment No such thing as a biased assessment

Reliability is a pre-requisite for validity Talking about “reliability and validity” is like talking about “swallows and birds” Validity includes reliability

6

Modern conceptions of validity

Validity subsumes all aspects of assessment qualityReliabilityRepresentativeness (content coverage)RelevancePredictiveness

But not impact (Popham: right concern, wrong concept)

“Validity is an integrative evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989 p. 13)

7

Threats to validityInadequate reliabilityConstruct-irrelevant varianceDifferences in scores are caused, in part, by differences not relevant to the

construct of interest The assessment assesses things it shouldn’t The assessment is “too big”

Construct under-representationDifferences in the construct are not reflected in scores

The assessment doesn’t assess things it should The assessment is “too small”

With clear construct definition all of these are technical—not value—issues

Be careful what you wish for…Campbell’s law (US) | Goodhart’s law (UK)“All performance indicators lose their usefulness when used as objects of

policy”The clearer you are about what you want, the more likely you are to get it,

but the less likely it is to mean anythingWhere the evaluative function is paramount, the challenge is to find “tests

worth teaching to”.

9

“All the women are strong, all the men are good-looking, and all the children are above average.” Garrison Keillor

The Lake Wobegon effect revisited

10

Achievement of English 16-year-olds

30

35

40

45

50

55

60

65

70

1995/961996/9751997/981998/991999/002000/012001/022002/032003/0462004/052005/062006/07

Percentage achieving

5 A*-C5A*-C +EM

11

Consequential validity? No such thing!As has been stressed several times already, it is not that adverse social consequences of test use render the use invalid, but, rather, that adverse social consequences should not be attributable to any source of test invalidity such as construct-irrelevant variance. If the adverse social consequences are empirically traceable to sources of test invalidity, then the validity of the test use is jeopardized. If the social consequences cannot be so traced—or if the validation process can discount sources of test invalidity as the likely determinants, or at least render them less plausible—then the validity of the test use is not overturned. Adverse social consequences associated with valid test interpretation and use may implicate the attributes validly assessed, to be sure, as they function under the existing social conditions of the applied setting, but they are not in themselves indicative of invalidity. (Messick, 1989, p. 88-89)

Centrality of construct definitionConstruct definition is essential to effective assessment

Allows clear distinction between adverse impact and bias(and anyway, bias is a property of inferences, not of instruments

Examples of how construct definition distinguishes impact and biasMental rotation of three-dimensional solidsTesting for admission to higher educationTesting of English language learners

A brief history of formative assessment“Formative assessment” has been used to describe:The time at which the assessment is scheduled

Any assessment taken before the last oneA purpose for assessing

“Assessment for learning”A function that the assessment outcomes serve

Assessments that change teaching Formative use of assessments

Feedback metaphorFeedback in engineeringPositive feedback

Leads to explosive increase or collapse (bad!)Negative feedback

Leads to asymptotic convergence to, or damped oscillation about, a stable equilibrium

Components of a feedback systemdata on the actual level of some measurable attribute;data on the reference level of that attribute;a mechanism for comparing the two levels and generating information about

the ‘gap’ between the two levels;a mechanism by which the information can be used to alter the gap.To an engineer, information is therefore feedback only if the information fed back is used in reducing the gap between actual and desired states.

Relevant studiesFuchs & Fuchs (1986)

Natriello (1987)

Crooks (1988)

Banger-Drowns, et al. (1991)

Kluger & DeNisi (1996)

Black & Wiliam (1998)

Nyquist (2003)

Dempster (1991, 1992)

Elshout-Mohr (1994)

Brookhart (2004)

Allal & Lopez (2005)

Köller (2005)

Brookhart (2007)

Wiliam (2007)

Hattie & Timperley (2007)

Shute (2008)

FeedbackKinds of feedback in Higher Education (Nyquist, 2003)

Weaker feedback only Knowledge of results (KoR)

Feedback only KoR + clear goals or knowledge of correct results (KCR)

Weak formative assessment KCR+ explanation (KCR+e)

Moderate formative assessment (KCR+e) + specific actions for gap reduction

Strong formative assessment (KCR+e) + activity

Effect of formative assessment (HE)N Effect*

Weaker feedback only 31 0.14

Feedback only 48 0.36

Weaker formative assessment 49 0.29

Moderate formative assessment 41 0.39

Strong formative assessment 16 0.56

*corrected values

The formative assessment hi-jack…Long-cycle Span: across units, terms Length: four weeks to one year Impact: Student monitoring; curriculum alignmentMedium-cycle Span: within and between teaching units Length: one to four weeks Impact: Improved, student-involved, assessment; teacher cognition about learningShort-cycle Span: within and between lessons Length:

day-by-day: 24 to 48 hours minute-by-minute: 5 seconds to 2 hours

Impact: classroom practice; student engagement

Functions of assessmentFor evaluating institutions, organizations and curricula

For describing individuals

For supporting learningMonitoring learning

Whether learning is taking placeDiagnosing (informing) learning

What is not being learnt Instructionally tractable

What to do about it

Formative assessment: a new definition“An assessment functions formatively to the extent that evidence about student achievement elicited by the assessment is interpreted and used to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence.

Formative assessment therefore involves the creation of, and capitalization upon, moments of contingency (short, medium and long cycle) in instruction with a view to regulating learning (proactive, interactive, and retroactive).” (Wiliam, 2009)

Some principlesA commitment to formative assessmentDoes not entail any view of what is to be learnedDoes not entail any view of what happens when learning takes place

The learning milieuFeedback must cause a cognitive engagement in learningMastery orientation vs. performance orientation (Dweck)Growth pathway vs. well-being pathway (Boekaerts)

Defining formative assessmentKey processesEstablishing where the learners are in their learningEstablishing where they are goingWorking out how to get there

ParticipantsTeachersPeersLearners

Aspects of formative assessment

Where the learner is going

Where the learner is How to get there

TeacherClarify and share

learning intentions

Engineering effective discussions, tasks and

activities that elicit evidence of learning

Providing feedback that moves learners

forward

PeerUnderstand and share learning

intentions

Activating students as learningresources for one another

LearnerUnderstand

learning intentionsActivating students as owners

of their own learning

Five “key strategies”…Clarifying, understanding, and sharing learning intentionscurriculum philosophy

Engineering effective classroom discussions, tasks and activities that elicit evidence of learningclassroom discourse, interactive whole-class teaching

Providing feedback that moves learners forward feedback

Activating students as learning resources for one another collaborative learning, reciprocal teaching, peer-assessment

Activating students as owners of their own learningmetacognition, motivation, interest, attribution, self-assessment

(Wiliam & Thompson, 2007)

…and one big ideaUse evidence about learning to adapt instruction to meet student needs

Examples of techniquesLearning intentions“sharing exemplars”

Eliciting evidence“mini white-boards”

Providing feedback“match the comments to the essays”

Students as owners of their learning“coloured cups”

Students as learning resources“pre-flight checklist”

28

So how do we design assessments?

Reliability requires random sampling from the domain of interestIncreasing reliability requires increasing the size of the sample

Using teacher assessment in certification is attractive: Increases reliability (increased test time) Increases validity (addresses aspects of construct under-representation)But problematic Lack of trust (“Fox guarding the hen house”) Problems of biased inferences (construct-irrelevant variance) Can introduce new kinds of construct under-representation

Progression in understanding light1 Know that light comes from different sources

2 Know that light passes through some materials and not others, and that when it does not, shadows may be formed

3 Know that light can be made to change direction, and that shiny surfaces can form images

4 Know that light travels in straight lines, and this can be used to explain the formation of shadows

5 Understand how light is reflected

6 Understand how prisms and lenses refract and disperse light

7 Be able to describe how simple optical devices work

8 Understand refraction as an effect of differences of velocities in different media

9 [nothing new at this level]

10 Understand the processes of dispersion, interference, diffraction and polarisation of light.

30

The challengeTo design an assessment system that is:Distributed

So that evidence collection is not undertaken entirely at the endSynoptic

So that learning has to accumulateExtensive

So that all important aspects are covered (breadth and depth)Progressive

So that assessment outcomes relate to learning progressionsManageable

So that costs are proportionate to benefitsTrusted

So that stakeholders have faith in the outcomes

31

The effects of contextBeliefsabout what constitutes learning; in the value of competition between students; in the value of competition between schools; that test results measure school effectiveness;about the trustworthiness in numerical data, with bias towards a single

number; that the key to schools’ effectiveness is strong top-down management; that teachers need to be told what to do, or conversely that they have all the

answers.

32

ConclusionThere is no “perfect” assessment system anywhere. Each nation’s assessment system is exquisitely tuned to local constraints and affordances.

Every country’s assessment system works in practice but not in theory.

Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context.

The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.

33

Conclusion (2)It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community.

Instead, it seems likely that it will be more productive to focus on those arguments and initiatives that are least offensive to existing assumptions and beliefs, and that will nevertheless serve to catalyze a shift in those assumptions and beliefs while at the same time improving some aspects of present practice.