Assessment: The “Hidden Variable” of Achievement

Assessment: The “Hidden Variable” of Achievement

A Field-tested Learning Assessment Guide (FLAG)

for STEM Instructors

Michael Zeilik University of New Mexico

(www.flaguide.org)

Pop Quiz!In terms of learning gains, which instructor attribute enhances achievement the most?

A. Teaching experienceB. Clarity of presentationsC. Energy and enthusiasmD. Deep knowledge of subjectE. None of the above

What Works? 20th Century Gain Results

Active Learning, Mastery Learning (SD ≥ 0.5 ≥ 0.5, PSI, discussion, debates, games, role playing, controversy)

Cooperative Learning (SD ≥ 0.5SD ≥ 0.5, cognitive and affective; century of research, all disciplines)

One-on-One Tutoring (SD = 2SD = 2,, with trained tutors)

EVERYTHING ELSE!Lectures (“standard model”), reinforce

memorizationmemorization!! Unstructured discussion, supervised

independent study, autonomous small groups, self study

Audio-tutorials, programmed instruction, computer-based instruction, instructional television, Web-based instruction

What Does Not Work? (Small gains SD < 0.3)

What Works in Context: Physics & Astronomy

Disciplinary education research, uses “tools of the trade” to conduct experiments

Create a mostly empirical robust knowledge base about learning in physics & astronomy

Probe initial state (prior knowledge), final state (learning outcomes), and student thinking

Measurement: Assessment!

Galileo Galilei:

“Measure what is measurable, and make measurable what is not so.”But: “Measure what you value, and value what you measure.” (M. Zeilik)

FLAG FeaturesAssessment Basics

“What is this assessment business all about?”

Making Goals “What do you want to measure?”

Classroom Assessment Techniques (CATs) “How do you measure it?”

Searchable Database—Toolbox “What are good tools for measurement?”

All peer reviewed and evidence based

FLAG CATs Attitudinal Surveys: E.

Seymour, E. Lewis Concept Tests:

A. Ellis Concept Maps:

M. Zeilik Conceptual Diagnostic

Tests: M. Zeilik

Interviews: M. Smith, S. A. Southerland

Performance Assessments: T. Slater

Portfolios: T. Slater

Scoring Rubrics: D. Ebert-May

Student Assessment of Learning Gains: E. Seymour

Weekly Reports: E. Etkina

Mathematical Thinking: M. Swan,

J. Rideway Multiple-Choice Tests:

J. Parkes Minute Papers:

M. Zeilik

Minute PaperTake a few minutes at the end of class and

ask for a written response to: “What was the most important

concept you learned in class?” “What important question remains

unanswered?” “What was the muddiest point of this class?”

Few Minute Paper: Teams reach consensus, submit written report

Analysis: Sort into themes (cards)Weekly Report: Extended minute paper

Quick feedback on conceptual (not factual) understanding

Instructor gives conceptual question with choices (common “misconceptions”)

After a minute, whole class responds (hands, flash cards, class polling system)

Instructor assesses responses: If most incorrect, pair up do discuss (peer teaching)

Class response again to gauge mastery; instructor adapts in real time

Concept Tests

Earth

asteroid

The magnitude of the force exerted by the asteroid on the Earth is

a) larger than the magnitude of the force exerted by the Earth on the asteroidb) the same as the magnitude of the force exerted by the Earth on the asteroidc) smaller than the magnitude of the force exerted by the Earth on the asteroidd) zero. (the asteroid exerts no force on the Earth).

P111 P221 P112Largerthan:

2% 2% 5%

Sameas:

13% 9% 24%

Smallerthan:

62% 84% 67%

Zero: 23% 5% 5%N 101 91 101

Results: Midwestern Universities (Dostal)(P111, conceptual; P221 calc; P112 algebra)

Attitude Surveys: Perceptions about course, discipline; seem easy-not!, best based on a robust, field-tested model

Minute Paper: “Given limited resources, what one change would you make to improve this course?”; sort by themes

Student Assessment of Learning Gains (SALG): Probes learning gains that students perceive; avoids performance critiques; easily customized (15 min); available on-line

Student Assessments

UNM P102 SALG Spring 2003(Mean = 3.38)

Error Bars are Standard Deviations

3.68

3.52

3.64

3.71

4.15

3.75

3.99

1.00 2.00 3.00 4.00 5.00

Instructor talk

Discussions

Team work

Activities

Coop quizzes

CT/CRS

Videos

Rating (scale 1 - 5)

P102 UNM Spring 2003 Gender Matters—Sometimes!

1.00 2.00 3.00 4.00 5.00

Math Self-Aptitude

Science Self-Aptitude

Last math

Pretestconfidence

PosttestConfidence

Score (1 = very poor, 5 = very good)

Female Male All

NS!

NS!

p < 0.002

p < 0.0001

p < 0.002

Cooperative Quiz Gainsby Gender

P102 UNM Spring 2003 Quizzes

0.00

0.20

0.40

0.60

0.80

1.00

Practice Quiz #1 Quiz #2 Quiz #3 Mean

<g> Female Effect size (F) <g> Male Effect size (M)

Does it stick? (Coop quizzes/Test)

A101 UNM Fall 2000Identical Items (p = 0.96)

0

20

40

60

80

100

120

Q1-12 Q2-4 Q2-9 Q2-8 Q2-7 Q3-6 Q3-7

Item

Percent correct

Pre %Post %Test#1

Attitude Results: Intro Astro & Physics

Measure(50% =neutral)

Pretest(%)

Posttest(%)

Gain andEffect Size

UNM A101(n = 311)

62 ± 10 63 Notsignificant

CMU A111( n = 224)

57 55 Notsignificant

UNM A271(n = 28)


UNM P102(n = 31)


Conceptual Diagnostic TestsIdeally research-based on

“misconceptions”, revealed by student “think aloud” interviews

Measures pre/post conceptual gains as a summative assessment

Force Concept Inventory (FCI; 1985); Astronomy Diagnostic Test (ADT; 1999) version 2; national baselines; large data sample (about 5,000)

Follow protocol!

ADT 2 UNM Fall 2000 vs. National

20

25

22

30 26 28

0

5

10

15

20

25

30

Gain (National) Gain (UNM)

Females

Males

All

ES (F) = 0.84 => 80% of postscores above mean of prescores

ES (M) = 0.53 => 70% of postscores above mean of prescores

(Pre = 5346; Post = 3842)

Pre/post: ADT National Project/UNM<g> = (post% - pre%)/(100% - pre%)

34

56

32.4

47.3

41.5

27

53.7

38

0

10

20

30

40

50

60

Pre-course (N=5346) Post-course (N=3842)

Score (%)

Overall: <g> = 0.22 Women: <g> = 0.20Men: <g> = 0.25 UNM: <g> = 0.31

UNM

UNM

Standard errors plotted

P102 UNM Spring 2003 Pre/Post

0%

10%

20%

30%

40%

50%

60%

70%

Pre% FCIPost% FCI<g> FCI

Pre(FCI + GRC)Post(FCI + GRC)<g> FCI+GRC

<g> GRC

Percentage

Female Male

Classroom Assessment:Classroom Assessment:Good News!Good News!

Well-done formative assessment results in a pre/post gain of about 0.5 standard deviation

(70% rather than 50% on a “standardized” test)

Documents

Assessment: The “Hidden Variable” of Achievement