28
Developing a multi- method design for carrying out comparability studies of tests aligned with the CEFR Jamie Dunlea, Richard Spiby British Council Quynh Thi Ngoc Nguyen, Yen Thi Quyn Nguyen, University of Languages and International Studies, Vietnam National University Hanoi 14 th EALTA Conference CIEP Sevres, France June 1-3, 2017 Assessment Research Group

Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

  • Upload
    ngokien

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Developing a multi-method design for

carrying out comparability studies of tests aligned

with the CEFR

Jamie Dunlea, Richard Spiby

British Council

Quynh Thi Ngoc Nguyen, Yen Thi Quyn Nguyen,

University of Languages and International Studies, Vietnam

National University Hanoi

14th EALTA

Conference

CIEP

Sevres, France

June 1-3, 2017

Assessment Research Group

Page 2: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Overview of the study

Assessment Research Group

The objectives of the of the comparison study

• To investigate the relationship between performance

of university students on the VSTEP and

performance on the Aptis test and the relationship of

both tests to the CEFR

• To investigate the comparability of the VSTEP and

Aptis from the perspective of constructs targeted and

test design

• To investigate local university students’ and

educators attitudes to the Aptis test through the

collection of qualitative questionnaire feedback

• To strengthen the methodology used for

comparability studies of tests linked to the CEFR,

particularly in relation to defining the constructs

Page 3: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

What is Aptis?

Assessment Research Group

Page 4: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

4

Assessment Research Group

Assessment Research Group

Page 5: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Vietnamese Standardized Test

of English Proficiency

Assessment Research Group

http://vstep.vn/

Page 6: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

VSTEP Test Description

• Target test takers: Vietnamese adult

learners of English from 18 years old to

test their language proficiency for different

purposes

• Proficiency scales:

Under 4.0: under reported

4.0 – 5.5: Level 3 (B1)

6.0 – 8.0: Level 4 (B2)

8.5 – 10: Level 5 (C1)

Page 7: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Socio-cognitive model

www.britishcouncil.org 7

Assessment Research Group

What is validity? Does the test measure what we want it to

measure?

Are the scores from the test accurate, reliable,

meaningful?

Are the scores useful for test users to make

decisions?

CONTEXT VALIDITY COGNITIVE VALIDITY

SCORING VALIDITY

CONSEQUENTIAL VALIDITY CRITERION –RELATED

VALIDITY

Page 8: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Assessment Research Group

Page 9: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Main data collection and analysis

www.britishcouncil.org 9

Assessment Research Group

Date Main data collection / analysis

Dec 2015- Feb

2016

Planning / preparing instruments

May 2016 Content analysis of VSTEP / APTIS

Data collection: content review of both tests by

trained panels of expert reviewers

May 2016 Pilot test at Vietnam National University.

Data collection: test scores from both tests,

questionnaires & interviews for test takers

May 2016- Oct

2016

Analysis of content review and pilot testing data

Revision of instruments for main data collection

Jan 2017 Main testing

Data collection: Test scores from both tests and

questionnaires for test takers

Jan 2017-Mar

2017

Main data analysis and preparation of technical

report

Date Main data collection / analysis

Dec 2015- Feb

2016

Planning / preparing instruments

May 2016 Content analysis of VSTEP / APTIS

Data collection: content review of both tests by

trained panels of expert reviewers

May 2016 Pilot test at Vietnam National University.

Data collection: test scores from both tests,

questionnaires & interviews for test takers

May 2016- Oct

2016 March 2017

Analysis of content review and pilot testing data

Revision of instruments for main data collection

Jan 2017

March – April

2017

Main testing

Data collection: Test scores from both tests and

questionnaires for test takers from 3 universities

July - Sep 2017 Main data analysis and preparation of technical

report

Page 10: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Results • Defining the constructs: contextual

and cognitive parameters

• Scoring: descriptive statistics,

correlations, exploratory factor

analysis.

• Questionnaires: attitudes of test

takers

Aptis – VSTEP comparison study

Assessment Research Group

Page 11: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT

study.

• The analysis categories reflect the contextual and

cognitive parameters used in the Aptis test

specifications.

• The categories have also been used in an extensive,

large scale validation study (Dunlea, 2016)

• The aim is to refine these categories to create a

standardized analysis format capable of capturing a

snapshot of the contextual and cognitive profile of the

tasks and items in a test

Aptis – VSTEP comparison study

Assessment Research Group

Page 12: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Defining the constructs • 2 teams of researchers, a team from the Language

Testing Research Group at Innsbruck University, and a

team from ULIS.

• Each team receives a 1-day training workshop on the

analysis forms

• Each team then individually evaluates the test content

using the analysis forms, each team then discusses

their judgments and reaches a consensus view on the

judgments

• Judgments from LTRGI has been analyzed

(Judgments from ULIS team completed in March 2017)

Aptis – VSTEP comparison study

Assessment Research Group

Page 13: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

www.britishcouncil.org 13

Test Aptis

General Component Reading Task

Matching headings

to text Features of the Task

Skill focus Expeditious global reading of longer text, integrating propositions across a longer

text into a discourse-level representation.

Task Level A1 A2 B1 B2 C1 C2 task

description

Matching headings to paragraphs within a longer text. Candidates read through

a longer text consisting of 7 paragraphs, identifying the best heading for each

paragraph from a bank of 8 options.

Cognitive

processing

Goal

setting

Expeditious reading: local

(scan/search for specifics)

Careful reading: local

(understanding sentence)

Expeditious reading: global

(skim for gist/search for key

ideas/detail)

Careful reading: global

(comprehend main idea(s)/overall

text(s))

Cognitive

processing

Levels of

reading

Word recognition

Lexical access

Syntactic parsing

Establishing propositional meaning (cl./sent. level)

Inferencing

Building a mental model

Creating a text level representation (disc. structure)

Creating an intertextual representation (multi-text)

Task specs: an example

Assessment Research Group

Page 14: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

www.britishcouncil.org 14

Features of the Input Text

Words 700-750 words

Domain Public Occupational Educational Personal

Discourse mode Descriptive Narrative Expository Argumentative Instructive

Content knowledge General Specific

Cultural specificity Neutral Specific

Nature information Only concrete Mostly concrete Fairly abstract Mainly abstract

Lexical Level K1 K2 K3 K4 K5 K6 K7 K8 K9 K10

Readability Flesch-Kincaid Grade Level 9-12

Grammar A1-B2 Exponents Average sentence length 18-20 words

Text genre Magazines, newspapers, instructional materials (such as extracts from

undergraduate textbooks describing important events and ideas, etc).

Task specs: an example

Assessment Research Group

Page 15: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

www.britishcouncil.org 15

Features of the Response

Target Lengt

h Up to 10 words Lexical K1-K5 Grammar

A1 –

B2

Distracto

rs

Lengt

h Up to 10 words Lexical K1-K5 Grammar

Key Within sentence Across

sentences

Across paragraphs

Assessment Research Group

Task specs: an example

Page 16: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Aptis – VSTEP comparison study

Assessment Research Group

Categories Reading Task 1 (Task 1) Item 1 (Task 1) Item 2 (Task 1) Item 3 (Task 1) Item 4 (Task 1) Item 5

CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS

Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK

Skill focus sentence comprehension, lexis

Task Level (CEFR) A1

Response format Multiple choice gap fill

Items per task 5

Cognitive processing 1 Careful reading: local

Cognitive processing 2 Establishing propositional meaning (cl./sent. level)

Content knowledge 1 (General)

Cultural specificity 1 (Neutral)

Features of the Input Text Features of the Input

Text Features of the Input

Text Features of the Input

Text Features of the Input

Text Features of the Input

Text Features of the Input

Text

Domain Personal

Discourse mode Descriptive

Nature of information Only concrete

Topic Daily life

Text genre Personal letters / e-mail

Presentation Verbal (written)

Features of the Response Features of the

Response Features of the

Response Features of the

Response Features of the

Response Features of the

Response Features of the

Response

Key information Within Sentences Within Sentences Within Sentences Within Sentences Within Sentences

Operation Main idea /

conclusions Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Question presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)

Option Presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)

Page 17: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Aptis – Reading Task 4

Assessment Research Group

Categories Reading APTIS Task 4 CONSENSUS

Features of the TASK Features of the TASK

Skill focus paragraph comprehension, reading for gist, understanding main ideas of longer complex text

Task Level (CEFR) B2 Response format Matching headings to text Items per task 7 Cognitive processing 1 Expeditious reading: global Cognitive processing 2 Building a mental model Content knowledge 2 Cultural specificity 1 (Neutral)

Page 18: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Aptis – Reading Task 4

Assessment Research Group

Features of the Input Text Features of the Input Text

Domain Public

Discourse mode Expository

Nature of information Fairly abstract

Topic Food and drink/Environmental issues

Text genre Magazines

Presentation Verbal (written)

Page 19: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Aptis – Reading Task 4

Assessment Research Group

Features of the

Response

(Task 4) Item 1

(Task 4) Item 2

(Task 4) Item 3

(Task 4) Item 4

(Task 4) Item 5

(Task 4) Item 6

(Task 4) Item 7

Key information

across sentences

across sentences

across sentences

across sentences

across sentences

across sentences

across sentences

Operation

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Question presentation

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Option Presentation

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Page 20: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

VSTEP Reading Task 4

Assessment Research Group

Categories Reading Task 4

CONSENSUS Features of the TASK Features of the TASK

Skill focus identifying main ideas, finer details and implied relationships, understanding longer complex texts

Task Level (CEFR) C1 Response format MCQ Items per task 10 Cognitive processing 1 Careful reading: global

Cognitive processing 2 Creating a text level representation (disc. structure)

Content knowledge 2 Cultural specificity 2

Page 21: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

VSTEP Reading Task

Assessment Research Group

Features of the Input Text Features of the Input Text

Domain Public

Discourse mode Expository

Nature of information Fairly abstract

Topic Health & medicine -- social topic/Science and technology

Text genre Newspapers

Presentation Verbal (written)

Page 22: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

VSTEP Reading Task

Assessment Research Group

Categories Reading

(Task 4) Item 1

(Task 4) Item 2

(Task 4) Item 3

(Task 4) Item 4

(Task 4) Item 5

(Task 4) Item 6

(Task 4) Item 7

(Task 4) Item 8

(Task 4) Item 9

(Task 4) Item 10

Key information

Within sentences

across sentences

Within sentences

across sentences

across sentences

Within sentences

across sentences

across sentences

across paragraphs

across sentences

Operation

Specific information

Main idea / conclusions

Specific information

Main idea / conclusions

Main idea / conclusions

Main idea / conclusions

Opinion Main idea / conclusions

Test structure / connections between the parts

Opinion

Question presentation

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Option Presentation

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Verbal (written)

Page 23: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Scoring analysis: Aptis CEFR

Assessment Research Group

Page 24: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Scoring analysis: Aptis CEFR

Assessment Research Group

Exact agreement: 62% Adjacent agreement: 38%

VSTEP CEFR

Total Under B1 B1 B2 Aptis

Overall

CEFR

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

VSTEP CEFR

Total Under B1 B1 B2 Aptis

Overall

CEFR

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

VSTEP CEFR

Total Under B1 B1 B2 Aptis

Overall

CEFR

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

Page 25: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Scoring analysis: Aptis CEFR

Assessment Research Group

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of

Sampling Adequacy. .930

Bartlett's Test of

Sphericity

Approx. Chi-Square 956.778

df 36

Sig. .000

Component Matrixa

Component

1 AptisGVScore .921

AptisLScore .829

AptisRScore .816

AptisSScore .881

AptisWScore .853

VSTEPRScore .803

VSTEPLScore .669

VSTEPWScore .870

VSTEPSScore .827

Extraction Method: Principal

Component Analysis.

Page 26: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Scoring analysis: Aptis CEFR

Assessment Research Group

I prefer computer-based writing tests

to paper-and-pencil-based writing

tests.

33% 35%

18%

11%

3%

0%

5%

10%

15%

20%

25%

30%

35%

40%

stronglyagree

agree disagree Stronglydisagree

no selection

33%

41%

14%

10%

2%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

stronglyagree

agree disagree Stronglydisagree

noselection

I prefer face-to-face speaking

tests to machine-recorded

speaking tests

Page 27: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Scoring analysis: Aptis CEFR

Assessment Research Group

I have taken other computer-based

English tests before taking today's

Aptis test.

I often use computers.

75%

24%

1%

0%

10%

20%

30%

40%

50%

60%

70%

80%

No Yes No selection

56%

40%

1% 1% 1%

0%

10%

20%

30%

40%

50%

60%

strongly agree agree disagree stronglydisagree

no selection

Page 28: Developing a multi- method design for carrying out ... · Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT study. • The analysis categories reflect

Some tentative conclusions

Assessment Research Group

1. The use of mixed methods, including both qualitative

and quantitative provides multiple perspectives and aids

interpretation

2. The socio-cognitive model has provided a coherent

structure for identifying the sources of evidence useful

for creating a detailed picture of the tests

3. The pilot data, on a small but robust scale has

demonstrated that the two tests do measure similar

constructs around general English proficiency.

4. Preliminary results, including statistical analysis gave us

confidence that it will be useful to proceed to the main

data collection