40
1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus

1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus

Embed Size (px)

Citation preview

1

The New Adaptive Version of the Basic English Skills Test

Oral Interview

Dorry M. Kenyon

Funded by OVAE Contract: ED-00-CO-0130

The BEST Plus

2

Overview

1. Why the BEST Plus?

2. What does the BEST Plus look like?

3. What is its research base?

4. How can the BEST Plus be used?

3

Overview

Why the BEST Plus? What does the BEST Plus look like? What is its research base? How can the BEST Plus be used?

4

The original BEST Oral Interview

Developed early 1980s Assessed basic functional oral English language

skills for adult immigrants and refugees Designed for program use Began to be widely used for accountability purposes

5

0 1 2 3

1. Where is he? 2. In <#3 in 1B>, where did you buy your food? 3. Is shopping in <#3 in 1B> and <#4 in 1B> the

same? How is it different/the same?

6

The BEST Plus

A performance-based assessment (individually administered face-to-face oral interview)

Assesses functional oral language skills (interpersonal communication) of adult ESL learners using everyday language

Designed with current assessment needs in mind

7

Goals in developing the BEST Plus

Respond to adult ESL program needs for assessment and accountability

– Produce a test that is short and practical– Assess learner language for a variety of purposes and

stakeholders– Increase accuracy in measuring oral proficiency– Provide “multiple forms” for pre- and post-testing

8

Overview

Why the BEST Plus? What does the BEST Plus look like? What is its research base? How can the BEST Plus be used?

9

BEST Plus components (computer-based version)

Test items appear on the computer screen (instead of in a test booklet)

If an item requires a visual, examinees view the visual on the computer screen (instead of a picture cue booklet)

Test administrators enter scores directly into the computer (instead of on a score sheet)

10

11

3. What does the computer-assisted BEST Plus look like?

12

3. What does the computer-assisted BEST Plus look like?

13

Sample computer screen

14

BEST Plus components (print-based version)

Three forms Within each form, locator test + three level tests

– SPL1-4– SPL 4-6– SPL 6-10

Materials– Picture booklet– Test booklet (scripts and score sheet combined)

15

16

Scoring on 3 components of proficiency

Listening Comprehension = How well did the examinee understand the setup and question?

Language Complexity = How did the examinee organize and elaborate the response?

Communication = How clearly did the examinee communicate meaning?

17

Ability estimation

After each question, the program estimates the examinee’s ability based on scores awarded on the current and all previous questions.

With each estimation, the accuracy of the measurement increases.

Goal: To ‘level off’ in estimation with acceptable level of accuracy.

18

Path through the computer-adaptive BEST Plus

Following a fixed “warm-up,” examinees are asked questions drawn from several thematically-based “folders.”

After hearing each response, the test administrator enters a score for each component.

After each set of scores is entered, the computer updates its estimate of the examinee’s ability, and chooses folders and questions as appropriate.

The test ends when one of three conditions is met. Users can instantly receive full score report.

19

Path through the print-based BEST Plus

Administer and score Locator questions (the fixed “warm-up” items + 2 high end discriminators)

Total score on Locator and choose level test based on chart

Administer level test Total raw score and find approximate SPL range Enter raw scores into computer BEST Plus Score

Management software to obtain full score report

20

Overview

Why the BEST Plus? What does the BEST Plus look like? What is its research base? How can the BEST Plus be used?

21

Rigorous development procedures

• Feasibility study (1999-2000)• Initial development (2000-2001)• Pilot, small scale field test, initial reliability study (2001)• Revisions (2001-2002)• Pilot, full scale field test, reliability study, standard setting

study (2002)• Finalization of training materials, ancillary materials, further

refinements (2003)

22

Full involvement of stakeholders

• OVAE oversight• Technical Working Group (TWG), comprised of

researchers, state directors, and local program directors and practitioners

• Item writers, comprised of experienced adult ESL teaching professionals

• Instructors and students in the field

23

Example: Full scale field test participants

• 9 states (DC, DE, FL, IL, MA, MD, OR, PA, VA)

• 23 programs• 41 administrators• 2420 examinees

24

Example: Reliability study 2002

• 32 adult ESL students• Two testing rooms (A, B)

• Administrator (project staff)• Observer/Co-Scorer (project staff)• Observer/Co-Scorer (novice scorer)

• Each student was tested, then immediately retested in second room

25

Average interrater agreement

Within administration (same room)

Total Score Room A

(3 raters)

Room B

(3 raters)

2002 .98 .97

26

Test/re-test reliability

Between Rooms

Final Ability Estimate

2002 .89

27

Example: Some initial validity evidences

• Analyses of ancillary data collected from program records during the field test, including test scores less than six months old

• Standard setting study

28

Correlations with program placement

Range of Correlation

Number of Programs

Percentage

.80 or above 7 30.4%

.70 to .79 9 39.1%

.60 to .69 3 13.1%

.50 to .59 3 13.1%

Below .50 1 4.3%

TOTALS 23 100%

29

Summary: Program placement correlations

69.5% of the correlations were .70 or higher

30

Example: Standard setting study

11 judges 30 student performances Performances (about 6 min each) arranged from

lowest to highest Judgment made: “Which SPL is best characterized

by this performance?” Judges were able to complete this task relating the

SPL descriptors to the observed performances

31

Overview

Why the BEST Plus? What does the BEST Plus look like? What is its research base? How can the BEST Plus be used?

32

The BEST Plus Score Report

Information includes:– BEST Plus Scale Score– SPL level– NRS level– Diagnostic information

33

Uses of the BEST Plus

Accountability– National Reporting System (NRS), as scores on the

BEST Plus relate to the 6 NRS levels for Speaking and Listening

– Program Evaluation

34

Standard setting outcome (SPLs)

SPL Scale Score Range

0 Below 330

1 330-400

2 401-417

3 418-438

4 439-472

5 473-506

6 507-540

7 541-598

8 599-706

9 707-795

10 Above 795

35

Standard setting outcome (NRS)

NRS Level Related SPL BEST Plus Scale Scores

Beginning ESL Literacy 0-1 Below 401

Beginning ESL 2-3 401-438

Low Intermediate ESL 4 439-472

High Intermediate ESL 5 473-506

Low Advanced ESL 6 507-540

High Advanced ESL 7 or more Above 540

36

Uses of the BEST Plus

Within Programs– Placement– Progress– Diagnosis– Screening

37

Diagnostic score report information

SPL N Average Listening

Average Complexity

Average Communication

0 118 .37 .22 .43

1 312 .87 .52 1.12

2 120 1.13 .73 1.49

3 169 1.25 .80 1.72

4 336 1.43 .95 2.03

5 318 1.60 1.15 2.33

6 270 1.73 1.31 2.55

7 340 1.85 1.50 2.77

8 317 1.91 1.91 2.88

9 96 1.95 2.34 2.95

10 24 1.98 2.85 2.99

Maximum Possible 2.00 4.00 3.00

38

Example (diagnostic information)

SPL = 5 Listening Language Complexity

Communication

Examinee 1.20 1.57 2.20 Average for SPL 5 1.60 1.15 2.33

Relative to other SPL 5s, current examinee is:

• Low in listening

• High in complexity

• Average in communication

39

Questions and discussion

40

--Thank you